:meta-keywords: select statement, select from clause, select where clause, group by, having clause, limit clause, join query, subquery, select for update :meta-description: The SELECT statement specifies columns that you want to retrieve from a table. ****** SELECT ****** The **SELECT** statement specifies columns that you want to retrieve from a table. :: SELECT [ ] [{TO | INTO} ] [FROM ] [WHERE ] [GROUP BY {col_name | expr} [ASC | DESC], ...[WITH ROLLUP]] [HAVING ] [ORDER BY {col_name | expr} [ASC | DESC], ... [NULLS {FIRST | LAST}] [LIMIT [offset,] row_count] [USING INDEX { index_name [,index_name, ...] | NONE }] [FOR UPDATE [OF ]] ::= ALL | DISTINCT | DISTINCTROW | UNIQUE ::= * | | *, ::= [:] identifier, [:] identifier, ... ::= [ {, } ... | ... | ... ] ::= [] | [ ] | | TABLE ( ) ::= [AS] [()] ::= [ONLY] | ALL [ EXCEPT ] ::= CLASS ::= { [INNER | {LEFT | RIGHT} [OUTER]] JOIN } ON ::= { CROSS JOIN | NATURAL [ LEFT | RIGHT ] JOIN } * *qualifier*: A qualifier. When omitted, it is set to **ALL**. * **ALL**: Retrieves all records of the table. * **DISTINCT**: Retrieves only records with unique values without allowing duplicates. **DISTINCT**, **DISTINCTROW**, and **UNIQUE** are used interchangeably. * <*select_expressions*> * \*: By using **SELECT** * statement, you can retrieve all columns from the table specified in the **FROM** clause. * *expression_comma_list*: *expression* can be a path expression (ex.: *tbl_name.col_name*), variable or table name. All general expressions including arithmetic operations can also be used. Use a comma (,) to separate each expression in the list. You can specify aliases by using the **AS** keyword for columns or expressions to be queried. Specified aliases are used as column names in **GROUP BY**, **HAVING** and **ORDER BY** clauses. The position index of a column is assigned based on the order in which the column was specified. The starting value is 1. As **AVG**, **COUNT**, **MAX**, **MIN**, or **SUM**, an aggregate function that manipulates the retrieved data can also be used in the *expression*. * *table_name*.\*: Specifies the table name and using \* has the same effect as specifying all columns for the given table. * *variable_comma_list*: The data retrieved by the *select_expressions* can be stored in more than one variable. * [:]\ *identifier*: By using the *:identifier* after **TO** (or **INTO**), you can store the data to be retrieved in the ':identifier' variable. * <*single_table_spec*> * If a superclass name is specified after the **ONLY** keyword, only the superclass, not the subclass inheriting from it, is selected. * If a superclass name is specified after the **ALL** keyword, the superclass as well as the subclass inheriting from it are both selected. * You can define the list of subclass not to be selected after the **EXCEPT** keyword. The following example shows how to retrieve host countries of the Olympic Games without any duplicates. This example is performed on the *olympic* table of *demodb*. The **DISTINCT** or **UNIQUE** keyword makes the query result unique. For example, when there are multiple *olympic* records of which each *host_nation* value is 'Greece', you can use such keywords to display only one value in the query result. .. code-block:: sql SELECT DISTINCT host_nation FROM olympic; :: host_nation ====================== 'Australia' 'Belgium' 'Canada' 'Finland' 'France' ... The following example shows how to define an alias to a column to be queried and sort the result record by using the column alias in the **ORDER BY** clause. At this time, the number of the result records is limited to 5 by using the **LIMIT** clause. .. code-block:: sql SELECT host_year as col1, host_nation as col2 FROM olympic ORDER BY col2 LIMIT 5; :: col1 col2 =================================== 2000 'Australia' 1956 'Australia' 1920 'Belgium' 1976 'Canada' 1948 'England' .. code-block:: sql SELECT CONCAT(host_nation, ', ', host_city) AS host_place FROM olympic ORDER BY host_place LIMIT 5; :: host_place ====================== 'Australia, Melbourne' 'Australia, Sydney' 'Belgium, Antwerp' 'Canada, Montreal' 'England, London' FROM Clause =========== The **FROM** clause specifies the table in which data is to be retrieved in the query. If no table is referenced, the **FROM** clause can be omitted. Retrieval paths are as follows: * Single table * Subquery * Derived table :: SELECT [] [ FROM [ {, | }... ] ] ::= * | | *, ::= [] | [] | | TABLE () ::= [AS] [()] ::= [ONLY] | ALL [EXCEPT ] ::= CLASS * <*select_expressions*>: One or more columns or expressions to query is specified. Use * to query all columns in the table. You can also specify an alias for a column or an expression to be queried by using the AS keyword. This keyword can be used in **GROUP BY**, **HAVING** and **ORDER BY** clauses. The position index of the column is given according to the order in which the column was specified. The starting value is 1. * <*table_specification*>: At least one table name is specified after the **FROM** clause. Subqueries and derived tables can also be used in the **FROM** clause. For details on subquery derived tables, see :ref:`subquery-derived-table`. .. code-block:: sql --FROM clause can be omitted in the statement SELECT 1+1 AS sum_value; :: sum_value ============= 2 .. code-block:: sql SELECT CONCAT('CoraDB', '2008' , 'R3.0') AS db_version; :: db_version ====================== 'CoraDB2008R3.0' Derived Table ------------- In the query statement, subqueries can be used in the table specification of the **FROM** clause. Such subqueries create derived tables where subquery results are treated as tables. A correlation specification must be used when a subquery that creates a derived table is used. Derived tables are also used to access the individual element of an attribute that has a set value. In this case, an element of the set value is created as an instance in the derived table. .. _subquery-derived-table: Subquery Derived Table ---------------------- Each instance in the derived table is created from the result of the subquery in the **FROM** clause. A derived table created form a subquery can have any number of columns and records. :: FROM (subquery) [AS] [derived_table_name [(column_name [{, column_name } ... ])]] * The number of *column_name* and the number of columns created by the *subquery* must be identical. * *derived_table_name* can be omitted. The following example shows how to retrieve the sum of the number of gold (*gold*) medals won by Korea and that of silver medals won by Japan. This example shows a way of getting an intermediate result of the subquery and processing it as a single result, by using a derived table. The query returns the sum of the *gold* values whose *nation_code* is 'KOR' and the *silver* values whose *nation_code* column is 'JPN'. .. code-block:: sql SELECT SUM (n) FROM (SELECT gold FROM participant WHERE nation_code = 'KOR' UNION ALL SELECT silver FROM participant WHERE nation_code = 'JPN') AS t(n); Subquery derived tables can be useful when combined with outer queries. For example, a derived table can be used in the **FROM** clause of the subquery used in the **WHERE** clause. The following example shows *nation_code*, *host_year* and *gold* records whose number of gold medals is greater than average sum of the number of silver and bronze medals when one or more silver or bronze medals were won. In this example, the query (the outer **SELECT** clause) and the subquery (the inner **SELECT** clause) share the *nation_code* attribute. .. code-block:: sql SELECT nation_code, host_year, gold FROM participant p WHERE gold > (SELECT AVG(s) FROM (SELECT silver + bronze FROM participant WHERE nation_code = p.nation_code AND silver > 0 AND bronze > 0) AS t(s)); :: nation_code host_year gold ========================================= 'JPN' 2004 16 'CHN' 2004 32 'DEN' 1996 4 'ESP' 1992 13 .. _where-clause: WHERE Clause ============ In a query, a column can be processed based on conditions. The **WHERE** clause specifies a search condition for data. :: WHERE ::= The **WHERE** clause specifies a condition that determines the data to be retrieved by *search_condition* or a query. Only data for which the condition is true is retrieved for the query results. (**NULL** value is not retrieved for the query results because it is evaluated as unknown value.) * *search_condition*: It is described in detail in the following sections. * :ref:`basic-cond-expr` * :ref:`between-expr` * :ref:`exists-expr` * :ref:`in-expr` * :ref:`is-null-expr` * :ref:`like-expr` * :ref:`any-some-all-expr` The logical operator **AND** or **OR** can be used for multiple conditions. If **AND** is specified, all conditions must be true. If **OR** is specified, only one needs to be true. If the keyword **NOT** is preceded by a condition, the meaning of the condition is reserved. The following table shows the order in which logical operators are evaluated. +--------------+--------------+---------------------------------------------------------------+ | Priority | Operator | Function | +==============+==============+===============================================================+ | 1 | **()** | Logical expressions in parentheses are evaluated first. | +--------------+--------------+---------------------------------------------------------------+ | 2 | **NOT** | Negates the result of the logical expression. | +--------------+--------------+---------------------------------------------------------------+ | 3 | **AND** | All conditions in the logical expression must be true. | +--------------+--------------+---------------------------------------------------------------+ | 4 | **OR** | One of the conditions in the logical expression must be true. | +--------------+--------------+---------------------------------------------------------------+ .. _group-by-clause: GROUP BY ... HAVING Clause ========================== The **GROUP BY** clause is used to group the result retrieved by the **SELECT** statement based on a specific column. This clause is used to sort by group or to get the aggregation by group using the aggregation function. Herein, a group consists of records that have the same value for the column specified in the **GROUP BY** clause. You can also set a condition for group selection by including the **HAVING** clause after the **GROUP BY** clause. That is, only groups satisfying the condition specified by the **HAVING** clause are queried out of all groups that are grouped by the **GROUP BY** clause. By SQL standard, you cannot specify a column (hidden column) not defined in the **GROUP BY** clause to the SELECT column list. However, by using extended CoraDB grammars, you can specify the hidden column to the SELECT column list. If you do not use the extended CoraDB grammars, the **only_full_group_by** parameter should be set to **yes**. For details, see :ref:`stmt-type-parameters`. :: SELECT ... GROUP BY {col_name | expr | position} [ASC | DESC], ... [WITH ROLLUP] [HAVING ] * *col_name* | *expr* | *position*: Specifies one or more column names, expressions, aliases or column location. Items are separated by commas. Columns are sorted on this basis. * [**ASC** | **DESC**]: Specifies the **ASC** or **DESC** sorting option after the columns specified in the **GROUP BY** clause. If the sorting option is not specified, the default value is **ASC**. * <*search_condition*>: Specifies the search condition in the **HAVING** clause. In the **HAVING** clause, you can refer to columns and aliases specified in the **GROUP BY** clause, or columns used in aggregate functions. .. note:: Even the hidden columns not specified in the **GROUP BY** clause can be referred to, if the value of the **only_full_group_by** parameter is set to **yes**. At this time, the HAVING condition does not affect to the query result. * **WITH ROLLUP**: If you specify the **WITH ROLLUP** modifier in the **GROUP BY** clause, the aggregate information of the result value of each GROUPed BY column is displayed for each group, and the total of all result rows is displayed at the last row. When a **WITH ROLLUP** modifier is defined in the **GROUP BY** clause, the result value for all rows of the group is additionally displayed. In other words, total aggregation is made for the value aggregated by group. When there are two columns for Group By, the former is considered as a large unit and the latter is considered as a small unit, so the total aggregation row for the small unit and the total aggregation row for the large unit are added. For example, you can check the aggregation of the sales result per department and salesperson through one query. .. code-block:: sql -- creating a new table CREATE TABLE sales_tbl (dept_no INT, name VARCHAR(20), sales_month INT, sales_amount INT DEFAULT 100, PRIMARY KEY (dept_no, name, sales_month)); INSERT INTO sales_tbl VALUES (201, 'George' , 1, 450), (201, 'George' , 2, 250), (201, 'Laura' , 1, 100), (201, 'Laura' , 2, 500), (301, 'Max' , 1, 300), (301, 'Max' , 2, 300), (501, 'Stephan', 1, 300), (501, 'Stephan', 2, DEFAULT), (501, 'Chang' , 1, 150),(501, 'Chang' , 2, 150), (501, 'Sue' , 1, 150), (501, 'Sue' , 2, 200); -- selecting rows grouped by dept_no SELECT dept_no, avg(sales_amount) FROM sales_tbl GROUP BY dept_no; :: dept_no avg(sales_amount) ======================================= 201 3.250000000000000e+02 301 3.000000000000000e+02 501 1.750000000000000e+02 .. code-block:: sql -- conditions in WHERE clause operate first before GROUP BY SELECT dept_no, avg(sales_amount) FROM sales_tbl WHERE sales_amount > 100 GROUP BY dept_no; :: dept_no avg(sales_amount) ======================================= 201 4.000000000000000e+02 301 3.000000000000000e+02 501 1.900000000000000e+02 .. code-block:: sql -- conditions in HAVING clause operate last after GROUP BY SELECT dept_no, avg(sales_amount) FROM sales_tbl WHERE sales_amount > 100 GROUP BY dept_no HAVING avg(sales_amount) > 200; :: dept_no avg(sales_amount) ======================================= 201 4.000000000000000e+02 301 3.000000000000000e+02 .. code-block:: sql -- selecting and sorting rows with using column alias SELECT dept_no AS a1, avg(sales_amount) AS a2 FROM sales_tbl WHERE sales_amount > 200 GROUP BY a1 HAVING a2 > 200 ORDER BY a2; :: a1 a2 ======================================= 301 3.000000000000000e+02 501 3.000000000000000e+02 201 4.000000000000000e+02 .. code-block:: sql -- selecting rows grouped by dept_no, name with WITH ROLLUP modifier SELECT dept_no AS a1, name AS a2, avg(sales_amount) AS a3 FROM sales_tbl WHERE sales_amount > 100 GROUP BY a1, a2 WITH ROLLUP; :: a1 a2 a3 ============================================================= 201 'George' 3.500000000000000e+02 201 'Laura' 5.000000000000000e+02 201 NULL 4.000000000000000e+02 301 'Max' 3.000000000000000e+02 301 NULL 3.000000000000000e+02 501 'Chang' 1.500000000000000e+02 501 'Stephan' 3.000000000000000e+02 501 'Sue' 1.750000000000000e+02 501 NULL 1.900000000000000e+02 NULL NULL 2.750000000000000e+02 .. _order-by-clause: ORDER BY Clause =============== The **ORDER BY** clause sorts the query result set in ascending or descending order. If you do not specify a sorting option such as **ASC** or **DESC**, the result set in ascending order by default. If you do not specify the **ORDER BY** clause, the order of records to be queried may vary depending on query. :: SELECT ... ORDER BY {col_name | expr | position} [ASC | DESC], ...] [NULLS {FIRST | LAST}] * *col_name* | *expr* | *position*: Specifies a column name, expression, alias, or column location. One or more column names, expressions or aliases can be specified. Items are separated by commas. A column that is not specified in the list of **SELECT** columns can be specified. * [**ASC** | **DESC**]: **ASC** means sorting in ascending order, and **DESC** is sorting in descending order. If the sorting option is not specified, the default value is **ASC**. * [**NULLS** {**FIRST** | **LAST**}]: **NULLS FIRST** sorts NULL at first, **NULLS LAST** sorts NULL at last. If this syntax is omitted, **ASC** sorts NULL at first, **DESC** sorts NULL at last. .. code-block:: sql -- selecting rows sorted by ORDER BY clause SELECT * FROM sales_tbl ORDER BY dept_no DESC, name ASC; :: dept_no name sales_month sales_amount ============================================================== 501 'Chang' 1 150 501 'Chang' 2 150 501 'Stephan' 1 300 501 'Stephan' 2 100 501 'Sue' 1 150 501 'Sue' 2 200 301 'Max' 1 300 301 'Max' 2 300 201 'George' 1 450 201 'George' 2 250 201 'Laura' 1 100 201 'Laura' 2 500 .. code-block:: sql -- sorting reversely and limiting result rows by LIMIT clause SELECT dept_no AS a1, avg(sales_amount) AS a2 FROM sales_tbl GROUP BY a1 ORDER BY a2 DESC LIMIT 3; :: a1 a2 ======================================= 201 3.250000000000000e+02 301 3.000000000000000e+02 501 1.750000000000000e+02 The following is an example how to specify the NULLS FIRST or NULLS LAST after ORDER BY clause. .. code-block:: sql CREATE TABLE tbl (a INT, b VARCHAR); INSERT INTO tbl VALUES (1,NULL), (2,NULL), (3,'AB'), (4,NULL), (5,'AB'), (6,NULL), (7,'ABCD'), (8,NULL), (9,'ABCD'), (10,NULL); .. code-block:: sql SELECT * FROM tbl ORDER BY b NULLS FIRST; :: a b =================================== 1 NULL 2 NULL 4 NULL 6 NULL 8 NULL 10 NULL 3 'ab' 5 'ab' 7 'abcd' 9 'abcd' .. code-block:: sql SELECT * FROM tbl ORDER BY b NULLS LAST; :: a b =================================== 3 'ab' 5 'ab' 7 'abcd' 9 'abcd' 1 NULL 2 NULL 4 NULL 6 NULL 8 NULL 10 NULL .. note:: **Translation of GROUP BY alias** .. code-block:: sql CREATE TABLE t1(a INT, b INT, c INT); INSERT INTO t1 VALUES(1,1,1); INSERT INTO t1 VALUES(2,NULL,2); INSERT INTO t1 VALUES(2,2,2); SELECT a, NVL(b,2) AS b FROM t1 GROUP BY a, b; -- Q1 When you run the above SELECT query, "GROUP BY a, b" is translated as: * "GROUP BY a, NVL(b, 2)"(alias name b) in 9.2 or before. The result is the same as Q2's result as below. .. code-block:: sql SELECT a, NVL(b,2) AS bxxx FROM t1 GROUP BY a, bxxx; -- Q2 :: a b ====================== 1 1 2 2 * "GROUP BY a, b"(column name b) in 9.3 or higher. The result is the same as Q3's result as below. .. code-block:: sql SELECT a, NVL(b,2) AS bxxx FROM t1 GROUP BY a, b; -- Q3 :: a b ====================== 1 1 2 2 2 2 .. _limit-clause: LIMIT Clause ============ The **LIMIT** clause can be used to limit the number of records displayed. You can specify a very big integer for *row_count* to display to the last row, starting from a specific row. The **LIMIT** clause can be used as a prepared statement. In this case, the bind parameter (?) can be used instead of an argument. **INST_NUM** () and **ROWNUM** cannot be included in the **WHERE** clause in a query that contains the **LIMIT** clause. Also, **LIMIT** cannot be used together with **HAVING GROUPBY_NUM** (). :: LIMIT {[offset,] row_count | row_count [OFFSET offset]} ::= ::= ::= | + | - ::= | * | / ::= | | ( ) * *offset*: Specifies the offset of the starting row to be displayed. The offset of the starting row of the result set is 0; it can be omitted and the default value is **0**. It can be one of unsigned int, a host variable or a simple expression. * *row_count*: Specifies the number of records to be displayed. It can be one of unsigned integer, a host variable or a simple expression. .. code-block:: sql -- LIMIT clause can be used in prepared statement PREPARE stmt FROM 'SELECT * FROM sales_tbl LIMIT ?, ?'; EXECUTE stmt USING 0, 10; .. code-block:: sql -- selecting rows with LIMIT clause SELECT * FROM sales_tbl WHERE sales_amount > 100 LIMIT 5; :: dept_no name sales_month sales_amount ============================================================== 201 'George' 1 450 201 'George' 2 250 201 'Laura' 2 500 301 'Max' 1 300 301 'Max' 2 300 .. code-block:: sql -- LIMIT clause can be used in subquery SELECT t1.* FROM (SELECT * FROM sales_tbl AS t2 WHERE sales_amount > 100 LIMIT 5) AS t1 LIMIT 1,3; -- above query and below query shows the same result SELECT t1.* FROM (SELECT * FROM sales_tbl AS t2 WHERE sales_amount > 100 LIMIT 5) AS t1 LIMIT 3 OFFSET 1; :: dept_no name sales_month sales_amount ============================================================== 201 'George' 2 250 201 'Laura' 2 500 301 'Max' 1 300 .. code-block:: sql -- LIMIT clause allows simple expressions for both offset and row_count SELECT * FROM sales_tbl WHERE sales_amount > 100 LIMIT ? * ?, (? * ?) + ?; .. _join-query: Join Query ========== A join is a query that combines the rows of two or more tables or virtual tables (views). In a join query, a condition that compares the columns that are common in two or more tables is called a join condition. Rows are retrieved from each joined table, and are combined only when they satisfy the specified join condition. A join query using an equality operator (=) is called an equi-join, and one without any join condition is called a cartesian product. Meanwhile, joining a single table is called a self join. In a self join, table **ALIAS** is used to distinguish columns, because the same table is used twice in the **FROM** clause. A join that outputs only rows that satisfy the join condition from a joined table is called an inner or a simple join, whereas a join that outputs both rows that satisfy and do not satisfy the join condition from a joined table is called an outer join. An outer join is divided into a left outer join which outputs all rows of the left table as a result(outputs NULL when the right table's columns don't match conditions), a right outer join which outputs all rows of the right table as a result(outputs NULL when the left table's columns don't match conditions) and a full outer join which outputs all rows of both tables. If there is no column value that corresponds to a table on one side in the result of an outer join query, all rows are returned as **NULL**. :: FROM [{, | { | } ...] ::= [] | [] | | TABLE () ::= { [INNER | {LEFT | RIGHT} [OUTER]] JOIN } ON ::= { CROSS JOIN | NATURAL [ LEFT | RIGHT ] JOIN } * <*join_table_specification*> * [**INNER**] **JOIN**: Used for inner join and requires join conditions. * {**LEFT** | **RIGHT**} [**OUTER**] **JOIN**: **LEFT** is used for a left outer join query, and **RIGHT** is for a right outer join query. * <*join_table_specification2*> * **CROSS JOIN**: Used for cross join and requires no join conditions. * **NATURAL** [**LEFT** | **RIGHT**] **JOIN**: Used for natural join and join condition is not used. It operates in the equivalent same way to have a condition between columns equivalent of the same name . Inner Join ---------- The inner join requires join conditions. The **INNER JOIN** keyword can be omitted. When it is omitted, the table is separated by a comma (,). The **ON** join condition can be replaced with the **WHERE** condition. The following example shows how to retrieve the years and host countries of the Olympic Games since 1950 where a world record has been set. The following query retrieves instances whose values of the *host_year* column in the *history* table are greater than 1950. The following two queries output the same result. .. code-block:: sql SELECT DISTINCT h.host_year, o.host_nation FROM history h INNER JOIN olympic o ON h.host_year = o.host_year AND o.host_year > 1950; SELECT DISTINCT h.host_year, o.host_nation FROM history h, olympic o WHERE h.host_year = o.host_year AND o.host_year > 1950; :: host_year host_nation =================================== 1968 'Mexico' 1980 'U.S.S.R.' 1984 'United States of America' 1988 'Korea' 1992 'Spain' 1996 'United States of America' 2000 'Australia' 2004 'Greece' Outer Join ---------- CoraDB does not support full outer joins; it supports only left and right joins. Path expressions that include subqueries and sub-columns cannot be used in the join conditions of an outer join. Join conditions of an outer join are specified in a different way from those of an inner join. In an inner join, join conditions can be expressed in the **WHERE** clause; in an outer join, they appear after the **ON** keyword within the **FROM** clause. Other retrieval conditions can be used in the **WHERE** or **ON** clause, but the retrieval result depends on whether the condition is used in the **WHERE** or **ON** clause. The table execution order is fixed according to the order specified in the **FROM** clause. Therefore, when using an outer join, you should create a query statement in consideration of the table order. It is recommended to use standard statements using { **LEFT** | **RIGHT** } [ **OUTER** ] **JOIN**, because using an Oracle-style join query statements by specifying an outer join operator **(+)** in the **WHERE** clause, even if possible, might lead the execution result or plan in an unwanted direction. The following example shows how to retrieve the years and host countries of the Olympic Games since 1950 where a world record has been set, but including the Olympic Games where any world records haven't been set in the result. This example can be expressed in the following right outer join query. In this example, all instances whose values of the *host_year* column in the *history* table are not greater than 1950 are also retrieved. All instances of *host_nation* are included because this is a right outer join. *host_year* that does not have a value is represented as **NULL**. .. code-block:: sql SELECT DISTINCT h.host_year, o.host_year, o.host_nation FROM history h RIGHT OUTER JOIN olympic o ON h.host_year = o.host_year WHERE o.host_year > 1950; :: host_year host_year host_nation ================================================ NULL 1952 'Finland' NULL 1956 'Australia' NULL 1960 'Italy' NULL 1964 'Japan' NULL 1972 'Germany' NULL 1976 'Canada' 1968 1968 'Mexico' 1980 1980 'USSR' 1984 1984 'USA' 1988 1988 'Korea' 1992 1992 'Spain' 1996 1996 'USA' 2000 2000 'Australia' 2004 2004 'Greece' A right outer join query can be converted to a left outer join query by switching the position of two tables in the **FROM** clause. The right outer join query in the previous example can be expressed as a left outer join query as follows: .. code-block:: sql SELECT DISTINCT h.host_year, o.host_year, o.host_nation FROM olympic o LEFT OUTER JOIN history h ON h.host_year = o.host_year WHERE o.host_year > 1950; :: host_year host_year host_nation ================================================ NULL 1952 'Finland' NULL 1956 'Australia' NULL 1960 'Italy' NULL 1964 'Japan' NULL 1972 'Germany' NULL 1976 'Canada' 1968 1968 'Mexico' 1980 1980 'USSR' 1984 1984 'USA' 1988 1988 'Korea' 1992 1992 'Spain' 1996 1996 'USA' 2000 2000 'Australia' 2004 2004 'Greece' Outer joins can also be represented by using **(+)** in the **WHERE** clause. The above example is a query that has the same meaning as the example using the **LEFT** **OUTER** **JOIN**. The **(+)** syntax is not ISO/ANSI standard, so it can lead to ambiguous situations. It is recommended to use the standard syntax **LEFT** **OUTER** **JOIN** (or **RIGHT** **OUTER** **JOIN**) if possible. .. code-block:: sql SELECT DISTINCT h.host_year, o.host_year, o.host_nation FROM history h, olympic o WHERE o.host_year = h.host_year(+) AND o.host_year > 1950; :: host_year host_year host_nation ================================================ NULL 1952 'Finland' NULL 1956 'Australia' NULL 1960 'Italy' NULL 1964 'Japan' NULL 1972 'Germany' NULL 1976 'Canada' 1968 1968 'Mexico' 1980 1980 'USSR' 1984 1984 'USA' 1988 1988 'Korea' 1992 1992 'Spain' 1996 1996 'USA' 2000 2000 'Australia' 2004 2004 'Greece' In the above examples, *h.host_year=o.host_year* is an outer join condition, and *o.host_year > 1950* is a search condition. If the search condition is not written in the **WHERE** clause but in the **ON** clause, the meaning and the result will be different. The following query also includes instances whose values of *o.host_year* are not greater than 1950. .. code-block:: sql SELECT DISTINCT h.host_year, o.host_year, o.host_nation FROM olympic o LEFT OUTER JOIN history h ON h.host_year = o.host_year AND o.host_year > 1950; :: host_year host_year host_nation ================================================ NULL 1896 'Greece' NULL 1900 'France' NULL 1904 'USA' NULL 1908 'United Kingdom' NULL 1912 'Sweden' NULL 1920 'Belgium' NULL 1924 'France' NULL 1928 'Netherlands' NULL 1932 'USA' NULL 1936 'Germany' NULL 1948 'England' NULL 1952 'Finland' NULL 1956 'Australia' NULL 1960 'Italy' NULL 1964 'Japan' NULL 1972 'Germany' NULL 1976 'Canada' 1968 1968 'Mexico' 1980 1980 'USSR' 1984 1984 'USA' 1988 1988 'Korea' 1992 1992 'Spain' 1996 1996 'USA' 2000 2000 'Australia' 2004 2004 'Greece' In the above example, **LEFT OUTER JOIN** should attach all rows to the result rows even if the left table's rows do not match to the condition; therefore, the left table's condition, "AND o.host_year > 1950" is ignored. But "WHERE o.host_year > 1950" is applied after the join operation is completed. Please consider that a condition after **ON** clause and a condition after **WHERE** clause can be applied differently in **OUTER JOIN**. Cross Join ---------- The cross join is a cartesian product, meaning that it is a combination of two tables, without any condition. For the cross join, the **CROSS JOIN** keyword can be omitted. When it is omitted, the table is separated by a comma (,). The following example shows how to write cross join. .. code-block:: sql SELECT DISTINCT h.host_year, o.host_nation FROM history h CROSS JOIN olympic o; SELECT DISTINCT h.host_year, o.host_nation FROM history h, olympic o; The above two queries output the same results. :: host_year host_nation =================================== 1968 'Australia' 1968 'Belgium' 1968 'Canada' 1968 'England' 1968 'Finland' 1968 'France' 1968 'Germany' ... 2004 'Spain' 2004 'Sweden' 2004 'USA' 2004 'USSR' 2004 'United Kingdom' 144 rows selected. (1.283548 sec) Committed. Natural Join ------------ When column names to be joined to each table are the same, that is, when you want to grant equivalent conditions between each column with the same name, a natural join, which can replace inner/outer join, can be used. .. code-block:: sql CREATE TABLE t1 (a int, b1 int); CREATE TABLE t2 (a int, b2 int); INSERT INTO t1 values(1,1); INSERT INTO t1 values(3,3); INSERT INTO t2 values(1,1); INSERT INTO t2 values(2,2); The below is an example of running **NATURAL JOIN**. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 NATURAL JOIN t2; Running the above query is the same as running the below query, and they display the same result. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 INNER JOIN t2 ON t1.a=t2.a; :: a b1 a b2 ================================================ 1 1 1 1 The below is an example of running **NATURAL LEFT JOIN**. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 NATURAL LEFT JOIN t2; Running the above query is the same as running the below query, and they display the same result. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 LEFT JOIN t2 ON t1.a=t2.a; :: a b1 a b2 ==================================================== 1 1 1 1 3 3 NULL NULL The below is an example of running **NATURAL RIGHT JOIN**. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 NATURAL RIGHT JOIN t2; Running the above query is the same as running the below query, and they display the same result. .. code-block:: sql SELECT /*+ RECOMPILE*/ * FROM t1 RIGHT JOIN t2 ON t1.a=t2.a; :: a b1 a b2 ==================================================== 1 1 1 1 NULL NULL 2 2 Subquery ======== A subquery can be used wherever expressions such as **SELECT** or **WHERE** clause can be used. If the subquery is represented as an expression, it must return a single column; otherwise it can return multiple rows. Subqueries can be divided into single-row subquery and multiple-row subquery depending on how they are used. Single-Row Subquery ------------------- A single-row subquery outputs a row that has a single column. If no row is returned by the subquery, the subquery expression has a **NULL** value. If the subquery is supposed to return more than one row, an error occurs. The following example shows how to retrieve the *history* table as well as the host country where a new world record has been set. This example shows a single-row subquery used as an expression. In this example, the subquery returns *host_nation* values for the rows whose values of the *host_year* column in the *olympic* table are the same as those of the *host_year* column in the *history* table. If there are no values that meet the condition, the result of the subquery is **NULL**. .. code-block:: sql SELECT h.host_year, (SELECT host_nation FROM olympic o WHERE o.host_year=h.host_year) AS host_nation, h.event_code, h.score, h.unit FROM history h; :: host_year host_nation event_code score unit ============================================================================================ 2004 'Greece' 20283 '07:53.0' 'time' 2004 'Greece' 20283 '07:53.0' 'time' 2004 'Greece' 20281 '03:57.0' 'time' 2004 'Greece' 20281 '03:57.0' 'time' 2004 'Greece' 20281 '03:57.0' 'time' 2004 'Greece' 20281 '03:57.0' 'time' 2004 'Greece' 20326 '210' 'kg' 2000 'Australia' 20328 '225' 'kg' 2004 'Greece' 20331 '237.5' 'kg' ... Multiple-Row Subquery --------------------- The multiple-row subquery returns one or more rows that contain the specified column. The result of the multiple-row subquery can create **SET**, **MULTISET** and **LIST**) by using an appropriate keyword. The following example shows how to retrieve nations, capitals and host cities for Olympic Game all together in the *nation* table. In this example, the subquery result is used to create a **List** from the values of the *host_city* column in the *olympic* table. This query returns *name* and *capital* value for *nation* table, as well as a set that contains *host_city* values of the *olympic* table with *host_nation* value. If the *name* value is an empty set in the query result, it is excluded. If there is no *olympic* table that has the same value as the *name*, an empty set is returned. .. code-block:: sql SELECT name, capital, list(SELECT host_city FROM olympic WHERE host_nation = name) AS host_cities FROM nation; :: name capital host_cities ================================================================== 'Somalia' 'Mogadishu' {} 'Sri Lanka' 'Sri Jayewardenepura Kotte' {} 'Sao Tome & Principe' 'Sao Tome' {} ... 'U.S.S.R.' 'Moscow' {'Moscow'} 'Uruguay' 'Montevideo' {} 'United States of America' 'Washington.D.C' {'Atlanta ', 'St. Louis', 'Los Angeles', 'Los Angeles'} 'Uzbekistan' 'Tashkent' {} 'Vanuatu' 'Port Vila' {} Such multiple-row subquery expressions can be used anywhere a collection-type value expression is allowed. However, they cannot be used where a collection-type constant value is required as in the **DEFAULT** specification in the class attribute definition. If the **ORDER BY** clause is not used explicitly in the subquery, the order of the multiple-row query result is not set. Therefore, the order of the multiple-row subquery result that creates **LIST** must be specified by using the **ORDER BY** clause. MATCH ===== The **MATCH** clause is used to query graph patterns in graph tables. It can only be used with vertex and edge tables created with **CREATE VERTEX TABLE** and **CREATE EDGE TABLE**. The **MATCH** clause appears in the **FROM** clause and allows you to specify graph patterns using node and edge notation. :: SELECT ... FROM MATCH [ =] [ANY_SHORTEST | ALL_SHORTEST] [, , ...] ::= ()-[]->() | ()-[]-() | ()<-[]-() ::= : ::= [[]:][{,}] ::= * **: Optional identifier to store the matched path, which can be used with path functions like **nodes()**, **path_size()**, etc. * **ANY_SHORTEST**: Optional keyword to find any one of the shortest paths between nodes. * **ALL_SHORTEST**: Optional keyword to find all shortest paths between nodes. * **: Specifies a node (vertex) in the graph pattern with an alias and vertex table name. * **: Specifies an edge (relationship) in the graph pattern with an optional alias and edge table name. The alias and colon can be omitted for unlabeled edges (e.g., **[:edge_label]**). The edge specification is optional. You can specify variable-length paths using **{min,max}** syntax to indicate the minimum and maximum number of hops. * The **->** operator indicates a directed edge from one node to another in the forward direction. * The **<-** operator indicates a directed edge from one node to another in the reverse direction. * The **-** operator (without arrow) allows bidirectional traversal, matching edges in either direction. The following examples demonstrate the use of **MATCH** with graph tables. First, we create the graph schema: .. code-block:: sql CREATE VERTEX TABLE node_label (name STRING, id STRING); INSERT VERTEX INTO node_label VALUES ('a', '0'); INSERT VERTEX INTO node_label VALUES ('b', '1'); INSERT VERTEX INTO node_label VALUES ('a', '2'); INSERT VERTEX INTO node_label VALUES ('a', '3'); CREATE EDGE TABLE edge_label (id STRING); INSERT EDGE FROM (SELECT node_label FROM node_label WHERE id = '0') TO (SELECT node_label FROM node_label WHERE id = '1') INTO edge_label VALUES ('0'); INSERT EDGE FROM (SELECT node_label FROM node_label WHERE id = '0') TO (SELECT node_label FROM node_label WHERE id = '2') INTO edge_label VALUES ('1'); INSERT EDGE FROM (SELECT node_label FROM node_label WHERE id = '1') TO (SELECT node_label FROM node_label WHERE id = '2') INTO edge_label VALUES ('2'); INSERT EDGE FROM (SELECT node_label FROM node_label WHERE id = '1') TO (SELECT node_label FROM node_label WHERE id = '3') INTO edge_label VALUES ('3'); INSERT EDGE FROM (SELECT node_label FROM node_label WHERE id = '3') TO (SELECT node_label FROM node_label WHERE id = '0') INTO edge_label VALUES ('4'); Simple Graph Pattern Queries ----------------------------- The following examples show how to query graph patterns using the **MATCH** clause without referencing edge properties: .. code-block:: sql SELECT a.id, b.id FROM MATCH (a:node_label)->(b:node_label) WHERE a.id = '0' ORDER BY 1,2; This query finds all nodes directly connected from the node with id '0'. .. code-block:: sql SELECT a.id, b.id, c.id FROM MATCH (a:node_label)->(b:node_label), (b:node_label)->(c:node_label) WHERE a.id = '0' ORDER BY 1,2,3; This query finds all two-hop paths starting from the node with id '0'. .. code-block:: sql SELECT a.id, b.id, c.id, d.id FROM MATCH (a:node_label)->(b:node_label), (b:node_label)->(c:node_label), (c:node_label)->(d:node_label) WHERE a.id = '0' ORDER BY 1,2,3,4; This query finds all three-hop paths starting from the node with id '0'. .. code-block:: sql SELECT a.id, b.id, c.id, d.id, e.id FROM MATCH (a:node_label)->(b:node_label), (b:node_label)->(c:node_label), (c:node_label)->(d:node_label), (d:node_label)->(e:node_label) WHERE a.id = '0' ORDER BY 1,2,3,4,5; This query finds all four-hop paths starting from the node with id '0'. Graph Pattern Queries with Edge Properties ------------------------------------------- The following examples show how to query graph patterns and also retrieve edge properties by specifying the edge alias: .. code-block:: sql SELECT a.id, e.id, b.id FROM MATCH (a:node_label)-[e:edge_label]->(b:node_label) WHERE a.id = '0' ORDER BY 1,2,3; This query retrieves both the connected nodes and the edge properties. You can also use predicates on edge properties in the **WHERE** clause to filter results based on edge attributes: .. code-block:: sql SELECT a.id, e.id, b.id FROM MATCH (a:node_label)-[e:edge_label]->(b:node_label) WHERE a.id = '0' AND e.id > 0 ORDER BY 1,2,3; This query filters the results to only include edges where the edge's id property is greater than 0. .. code-block:: sql SELECT a.id, e1.id, b.id, e2.id, c.id FROM MATCH (a:node_label)-[e1:edge_label]->(b:node_label), (b:node_label)-[e2:edge_label]->(c:node_label) WHERE a.id = '0' ORDER BY 1,2,3,4,5; This query finds two-hop paths with edge information. .. code-block:: sql SELECT a.id, e1.id, b.id, e2.id, c.id, e3.id, d.id FROM MATCH (a:node_label)-[e1:edge_label]->(b:node_label), (b:node_label)-[e2:edge_label]->(c:node_label), (c:node_label)-[e3:edge_label]->(d:node_label) WHERE a.id = '0' ORDER BY 1,2,3,4,5,6,7; This query finds three-hop paths with all edge information. .. code-block:: sql SELECT a.id, e1.id, b.id, e2.id, c.id, e3.id, d.id, e4.id, f.id FROM MATCH (a:node_label)-[e1:edge_label]->(b:node_label), (b:node_label)-[e2:edge_label]->(c:node_label), (c:node_label)-[e3:edge_label]->(d:node_label), (d:node_label)-[e4:edge_label]->(f:node_label) WHERE a.id = '0' ORDER BY 1,2,3,4,5,6,7,8,9; This query finds four-hop paths with all edge information. Reverse Edge Traversal ----------------------- The **<-** operator allows you to traverse edges in the reverse direction, matching edges from right to left: .. code-block:: sql SELECT a.id, e.id, b.id FROM MATCH (a:node_label)<-[e:edge_label]-(b:node_label) WHERE a.id = '0' ORDER BY 1,2,3; This query finds all edges where node 'a' (with id '0') is the target and retrieves the source nodes and edge properties. The **<-** operator reverses the direction of edge traversal. Unlabeled Edge Patterns ------------------------ You can omit the edge alias when you don't need to access edge properties directly: .. code-block:: sql SELECT a.id, b.id FROM MATCH (a:node_label)-[:edge_label]->(b:node_label) WHERE a.id = '0' ORDER BY 1,2; This query matches the graph pattern without binding the edge to a variable, which is useful when you only need information about the connected nodes. Variable-Length Paths --------------------- The **MATCH** clause supports variable-length path queries using the **{min,max}** syntax. This allows you to specify a range for the number of hops to traverse: .. code-block:: sql SELECT a.id, b.id FROM MATCH (a:node_label)-[{1,2}]->(b:node_label) WHERE a.id = '0' ORDER BY 1,2; This query finds all nodes reachable from node with id '0' within 1 to 2 hops in the forward direction. Bidirectional Edge Traversal ----------------------------- The **MATCH** clause supports bidirectional edge traversal using the **-** operator (without an arrow). This matches edges in either direction: .. code-block:: sql SELECT a.id, b.id FROM MATCH (a:node_label)-[{1,2}]-(b:node_label) WHERE a.id = '0' ORDER BY 1,2; This query finds all nodes reachable from node with id '0' within 1 to 2 hops in either direction (forward or reverse). Shortest Path Queries --------------------- The **MATCH** clause supports shortest path queries using **ANY_SHORTEST** and **ALL_SHORTEST** keywords. These allow you to find the shortest path(s) between nodes in a graph. ANY_SHORTEST ^^^^^^^^^^^^ The **ANY_SHORTEST** keyword finds any one of the shortest paths between two nodes. This is useful when you only need to know that a shortest path exists and don't need all possible shortest paths. .. code-block:: sql SELECT a.id, b.id, horizontal_count(e) FROM MATCH ANY_SHORTEST (a:node_label)-[e:edge_label{1,2}]->(b:node_label) ORDER BY 1,2,3; This query finds any shortest path between nodes within 1 to 2 hops in the forward direction. .. code-block:: sql SELECT a.id, b.id, horizontal_count(e) FROM MATCH ANY_SHORTEST (a:node_label)-[e:edge_label{1,2}]-(b:node_label) WHERE a.id = 'A' AND b.id = 'B' ORDER BY 1,2,3; This query finds any shortest path between specific nodes 'A' and 'B' using bidirectional traversal. .. code-block:: sql SELECT a.id, b.id FROM MATCH ANY_SHORTEST (a:node_label)-[e:edge_label{1,2}]-(b:node_label) WHERE a.id < b.id ORDER BY 1,2; This query finds shortest paths where the source node ID is less than the destination node ID. ALL_SHORTEST ^^^^^^^^^^^^ The **ALL_SHORTEST** keyword finds all shortest paths between two nodes. This returns all paths that have the minimum length. .. code-block:: sql SELECT a.id, b.id FROM MATCH ALL_SHORTEST (a:node_label)-[e:edge_label{1,2}]->(b:node_label) WHERE a.id='0'; This query finds all shortest paths from node '0' to any other reachable node within 1 to 2 hops. .. code-block:: sql SELECT a.id, b.id FROM MATCH ALL_SHORTEST (a:node_label)-[e:edge_label{1,3}]->(b:node_label) WHERE a.id='0' AND e.id='0'; This query finds all shortest paths from node '0' with the additional constraint that edges must have id=1. .. code-block:: sql SELECT a.id, b.id FROM MATCH pt1 = ALL_SHORTEST (a:node_label)-[e:edge_label{1,9}]->(b:node_label) WHERE a.id='0' AND b.id='1'; This query finds all shortest paths between specific nodes '0' and '1', storing the path in variable 'pt1'. Path Variables -------------- Path variables allow you to capture and manipulate entire paths in your queries. You can assign a path to a variable and then use path functions to extract information about the path. .. code-block:: sql SELECT path_size(pth) AS pathLength FROM MATCH pth=ANY_SHORTEST (a:node_label)-[e:edge_label{1,2}]-(b:node_label) WHERE e.id = '0' AND a.id = '0' AND b.id = '1' ORDER BY 1; The query above uses the **path_size()** function to get the length of the shortest path stored in variable 'pth'. .. code-block:: sql SELECT a.id, b.id, list_comp(x in nodes(pt1) | property_of(x, id, node_label)) AS id_path FROM MATCH pt1 = ALL_SHORTEST (a:node_label)-[e:edge_label{1,6}]->(b:node_label) WHERE a.id='0' AND b.id='1'; The query above uses the **nodes()** function to extract all nodes from the path and **list_comp()** to create a list of node IDs along the path. .. code-block:: sql SELECT a.id, b.id, list_comp(x in nodes(pt1) | property_of(x, id, node_label)) AS id_path FROM MATCH pt1 = ALL_SHORTEST (b:node_label)<-[e:edge_label{1,6}]-(a:node_label) WHERE a.id='0' AND b.id='1'; The query above demonstrates reverse path traversal with a path variable, finding all shortest paths from node '0' to node '1' in the reverse direction. MATCH as a Predicate in WHERE Clause ------------------------------------- The **MATCH** clause can also be used as a predicate in the **WHERE** clause to filter results based on the existence of specific graph patterns. This allows you to combine pattern matching with other query conditions. First, let's create sample data with multiple edge types: .. code-block:: sql CREATE VERTEX TABLE Person (name STRING, id STRING); CREATE EDGE TABLE Likes (notes STRING); CREATE EDGE TABLE Hates (since STRING, grade STRING); CREATE EDGE TABLE Loves; INSERT VERTEX INTO Person VALUES ('A', '0'); INSERT VERTEX INTO Person VALUES ('B', '1'); INSERT VERTEX INTO Person VALUES ('C', '2'); INSERT VERTEX INTO Person VALUES ('D', '3'); INSERT VERTEX INTO Person VALUES ('E', '4'); INSERT VERTEX INTO Person VALUES ('F', '5'); INSERT VERTEX INTO Person VALUES ('G', '6'); INSERT VERTEX INTO Person VALUES ('H', '7'); INSERT VERTEX INTO Person VALUES ('I', '8'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '1') INTO Likes VALUES ('best friend'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '1') TO (SELECT Person FROM Person WHERE id = '2') INTO Likes VALUES ('worst friend'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '1') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '1') TO (SELECT Person FROM Person WHERE id = '2') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '3') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '3') TO (SELECT Person FROM Person WHERE id = '4') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '4') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '5') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '5') TO (SELECT Person FROM Person WHERE id = '6') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '6') TO (SELECT Person FROM Person WHERE id = '0') INTO Loves VALUES (); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '0') TO (SELECT Person FROM Person WHERE id = '7') INTO Hates VALUES ('1980', '8'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '7') TO (SELECT Person FROM Person WHERE id = '8') INTO Hates VALUES ('1990', '2'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '4') TO (SELECT Person FROM Person WHERE id = '5') INTO Hates VALUES ('2010', '10'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '5') TO (SELECT Person FROM Person WHERE id = '7') INTO Hates VALUES ('2020', '5'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '5') TO (SELECT Person FROM Person WHERE id = '6') INTO Hates VALUES ('2020', '7'); INSERT EDGE FROM (SELECT Person FROM Person WHERE id = '1') TO (SELECT Person FROM Person WHERE id = '7') INTO Hates VALUES ('2020', '9'); The following examples demonstrate using **MATCH** as a predicate in the **WHERE** clause: .. code-block:: sql SELECT a.name, b.name, c.name FROM MATCH (a:Person)->(b:Person), (b:Person)->(c:Person) WHERE MATCH (a:Person)-[l:Loves]->(c:Person) ORDER BY 1,2,3; This query finds two-hop paths where there is also a direct "Loves" edge from 'a' to 'c', effectively filtering for paths where the starting and ending nodes have a love relationship. .. code-block:: sql SELECT a.name, b.name, c.name FROM MATCH (a:Person)->(b:Person), (b:Person)->(c:Person) WHERE MATCH (a:Person)-[l:Loves]->(c:Person) AND a.name = 'A' ORDER BY 1,2,3; This query adds an additional condition to filter for paths starting from person 'A'. .. code-block:: sql SELECT a.name, b.name, c.name FROM MATCH (a:Person)->(b:Person), (b:Person)->(c:Person) WHERE c.name = 'E' AND MATCH (a:Person)-[l:Loves]->(c:Person) ORDER BY 1,2,3; This query filters for paths ending at person 'E' where there is also a "Loves" edge from the start to the end. Using NOT with MATCH Predicates ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can also use the **NOT** operator with **MATCH** predicates to filter results based on the non-existence of graph patterns: .. code-block:: sql SELECT a.name, b.name, k.since, k.grade FROM MATCH (a:Person)->(b:Person), (b:Person)-[k:Hates]->(c:Person) WHERE NOT MATCH (a:Person)<->(b:Person) OR a.name = 'A' OR a.name = 'C' OR a.name = 'D' ORDER BY 1,2,3,4; This query demonstrates using **NOT MATCH** to filter results where a specific pattern does not exist. The bidirectional operator **<->** matches edges in either direction between nodes. The query finds paths through intermediary nodes where certain conditions are met, combined with the absence of specific patterns. Joins ^^^^^ The **MATCH** clause supports all standard SQL **JOIN** operators, allowing you to combine multiple graph patterns and join graph patterns with relational tables. This enables complex queries that span both graph and relational data models. **Key Features:** * **JOIN operators**: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN can all be used with MATCH patterns * **Mixed joins**: You can join MATCH patterns with regular relational tables * **Multiple patterns**: Join multiple independent MATCH patterns together * **ON conditions**: Standard ON clauses work with graph pattern results The following examples use a Formula 1 racing database schema: .. code-block:: sql CREATE VERTEX TABLE drivers (driverId INT NOT NULL AUTO_INCREMENT, race_number INT, driver_name STRING, surname STRING, nationality STRING); CREATE VERTEX TABLE constructors (constructor_id INT NOT NULL AUTO_INCREMENT, constructor_name STRING, nationality STRING, fans_cnt INT, social_platform ENUM('Facebook', 'Instagram') DEFAULT 'Facebook'); CREATE VERTEX TABLE circuits (circuitId INT NOT NULL AUTO_INCREMENT, circuit_name STRING); CREATE VERTEX TABLE countries (countryId INT NOT NULL AUTO_INCREMENT, name STRING, fans_cnt INT, social_platform ENUM('Facebook', 'Instagram') DEFAULT 'Facebook'); CREATE EDGE TABLE DRIVES_FOR (id INT NOT NULL AUTO_INCREMENT, driverId INT, constructor_id INT, since STRING, PRIMARY KEY (id)); CREATE EDGE TABLE IS_WINNER_OF (id INT NOT NULL AUTO_INCREMENT, driverId INT, circuitId INT, season STRING, PRIMARY KEY (id)); CREATE EDGE TABLE BORN_IN (id INT NOT NULL AUTO_INCREMENT, driverId INT, countryId INT, since STRING, PRIMARY KEY (id)); CREATE EDGE TABLE RESIDES_IN (id INT NOT NULL AUTO_INCREMENT, constructor_id INT, countryId INT, since STRING, PRIMARY KEY (id)); CREATE EDGE TABLE LOCATED_IN (id INT NOT NULL AUTO_INCREMENT, circuitId INT, countryId INT, since STRING, PRIMARY KEY (id)); CREATE EDGE TABLE RACE_AT (id INT NOT NULL AUTO_INCREMENT, driverId INT, circuitId INT, event_date DATE, race_name STRING); Joining Multiple MATCH Patterns """""""""""""""""""""""""""""""" The following examples demonstrate joining multiple independent MATCH patterns together using standard JOIN operators. **Example 1: Multiple INNER JOINs with MATCH Patterns** This query demonstrates joining multiple independent MATCH patterns using INNER JOIN: .. code-block:: sql SELECT /*+ RECOMPILE */ DISTINCT D.RACE_NUMBER, D.DRIVER_NAME, D.SURNAME, driver.DRIVER_NAME, driver.SURNAME, team.CONSTRUCTOR_NAME, team.NATIONALITY, winner.id, home.name, home2.name, ISF1.since, ISF2.since, home3.name, bi.since FROM MATCH (D:DRIVERS)-[df:DRIVES_FOR]->(team:Constructors), (team:Constructors)-[isf1:RESIDES_IN]->(home:countries) INNER JOIN MATCH (driver2:DRIVERS)-[bi:BORN_IN]->(home3:countries) ON d.driverId = driver2.driverId INNER JOIN MATCH (driver:drivers)-[winner:IS_WINNER_OF]->(gp:circuits), (gp:circuits)-[isf2:LOCATED_IN]->(home2:countries) ON home3.countryId = home2.countryId WHERE isf1.since = '2009' AND home.name = 'Belgium' ORDER BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14; This query joins three separate MATCH patterns: drivers with their teams and team locations, drivers with their birth countries, and race winners with circuit locations. The ON clauses link the patterns through common attributes. **Example 2: LEFT OUTER JOIN with MATCH Patterns** This query demonstrates using LEFT OUTER JOIN to include all results from the left pattern even when there's no match in the right pattern: .. code-block:: sql SELECT /*+ RECOMPILE */ DISTINCT D.RACE_NUMBER, D.DRIVER_NAME, D.SURNAME, D.NATIONALITY, T.CONSTRUCTOR_NAME, T.NATIONALITY, C2.CIRCUIT_NAME, W.SEASON, W2.EVENT_DATE FROM MATCH (D:DRIVERS)-[W:IS_WINNER_OF]->(C:CIRCUITS) LEFT OUTER JOIN MATCH (T:CONSTRUCTORS)-[W2:RACE_AT]->(C2:CIRCUITS) ON D.NATIONALITY = T.NATIONALITY WHERE D.DRIVER_NAME = 'Charles' AND T.NATIONALITY = 'French' ORDER BY 4, 6, 1, 2, 3, 5, 7, 8, 9; This query finds race winners and performs a LEFT OUTER JOIN with constructor race information, matching on nationality. Drivers without matching constructor information will still appear in the results with NULL values. **Example 3: Multiple RIGHT JOINs with MATCH Patterns** This query demonstrates using RIGHT JOIN to prioritize results from the right-hand MATCH patterns: .. code-block:: sql SELECT /*+ RECOMPILE */ DISTINCT D.RACE_NUMBER, D.DRIVER_NAME, D.SURNAME, D.NATIONALITY, T.CONSTRUCTOR_NAME, T.NATIONALITY, C2.CIRCUIT_NAME, W.SEASON, W2.EVENT_DATE, countries.name FROM MATCH (D:DRIVERS)-[W:IS_WINNER_OF]-(C:CIRCUITS) RIGHT JOIN MATCH (T:CONSTRUCTORS)-[W2:RACE_AT]-(C2:CIRCUITS) ON C.circuitId = C2.circuitId RIGHT JOIN MATCH (countries:Countries) ON T.social_platform = countries.social_platform WHERE D.NATIONALITY = 'Australian' ORDER BY 4, 6, 1, 2, 3, 5, 7, 8, 9; This query uses two RIGHT JOINs to ensure all constructors and countries appear in the results, even if there are no matching drivers or circuits. The WHERE clause filters the final result set. Hybrid Joins with Relational Tables """""""""""""""""""""""""""""""""""" MATCH patterns can also be joined with regular relational tables. This allows you to combine graph traversal with traditional relational queries in a single statement, enabling powerful hybrid queries that span both relational and graph data models. **Example 1: INNER JOIN with Multiple Relational Tables and MATCH Patterns** This query demonstrates using INNER JOIN to combine multiple relational tables with MATCH patterns: .. code-block:: sql CREATE TABLE race_results (race_id INT, driver_id INT, pos INT, points FLOAT); CREATE TABLE seasons (season_id INT, yearOf INT, champion_id INT); SELECT /*+ RECOMPILE */ DISTINCT Drivers.DRIVER_NAME, Drivers.NATIONALITY, seasons.yearOf, race_results.pos, T.CONSTRUCTOR_NAME, C.CIRCUIT_NAME FROM Drivers INNER JOIN race_results ON Drivers.driverId = race_results.driver_id INNER JOIN seasons ON race_results.race_id = seasons.season_id INNER JOIN MATCH (D:DRIVERS)-[df:DRIVES_FOR]->(T:CONSTRUCTORS) ON Drivers.driverId = D.driverId INNER JOIN MATCH (D2:DRIVERS)-[W:IS_WINNER_OF]->(C:CIRCUITS) ON D.driverId = D2.driverId WHERE seasons.yearOf = 2023 AND race_results.pos <= 3 ORDER BY 1, 2, 3; This query combines three relational tables (Drivers, race_results, seasons) with two MATCH patterns to find drivers who finished in the top 3 positions, showing their team affiliations and race wins. All joins use INNER JOIN to ensure only matching records appear in the results. **Example 2: LEFT JOIN with Relational Tables and MATCH Patterns** This query demonstrates using LEFT JOIN to preserve all relational table rows while optionally including graph data: .. code-block:: sql CREATE TABLE driver_contracts (contract_id INT, driver_id INT, start_date DATE, end_date DATE, salary FLOAT); CREATE TABLE team_budgets (team_id INT, yearOf INT, budget FLOAT); SELECT /*+ RECOMPILE */ DISTINCT Drivers.DRIVER_NAME, Drivers.SURNAME, driver_contracts.salary, team_budgets.budget, T.CONSTRUCTOR_NAME, home.name AS country_name FROM Drivers LEFT JOIN driver_contracts ON Drivers.driverId = driver_contracts.driver_id LEFT JOIN team_budgets ON driver_contracts.contract_id = team_budgets.team_id LEFT JOIN MATCH (D:DRIVERS)-[df:DRIVES_FOR]->(T:CONSTRUCTORS) ON Drivers.driverId = D.driverId LEFT JOIN MATCH (T2:CONSTRUCTORS)-[res:RESIDES_IN]->(home:countries) ON T.constructor_id = T2.constructor_id WHERE Drivers.NATIONALITY = 'British' OR Drivers.NATIONALITY = 'German' ORDER BY 1, 2; This query retrieves all drivers from Britain or Germany and their contract information, along with their team and team location data when available. Using LEFT JOIN ensures that all drivers appear in the results even if they don't have active contracts or team affiliations in the graph. VALUES ====== The **VALUES** clause prints out the values of rows defined in the expression. In most cases, the **VALUES** clause is used for creating a constant table, however, the clause itself can be used. When one or more rows are specified in the **VALUES** clause, all rows should have the same number of the elements. VALUES (expression[, ...])[, ...] * *expression* : An expression enclosed within parentheses stands for one row in a table. The **VALUES** clause can be used to express the **UNION ALL** query, which consists of constant values in a simpler way. For example, the following query can be executed. .. code-block:: sql VALUES (1 AS col1, 'first' AS col2), (2, 'second'), (3, 'third'), (4, 'fourth'); The above query prints out the following result. .. code-block:: sql SELECT 1 AS col1, 'first' AS col2 UNION ALL SELECT 2, 'second' UNION ALL SELECT 3, 'third' UNION ALL SELECT 4, 'fourth'; The following example shows use of the **VALUES** clause with multiple rows in the **INSERT** statement. .. code-block:: sql INSERT INTO athlete (code, name, gender, nation_code, event) VALUES ('21111', 'Jang Mi-Ran ', 'F', 'KOR', 'Weight-lifting'), ('21112', 'Son Yeon-Jae ', 'F', 'KOR', 'Rhythmic gymnastics'); The following example shows how to use subquery in the **FROM** statement. .. code-block:: sql SELECT a.* FROM athlete a, (VALUES ('Jang Mi-Ran', 'F'), ('Son Yeon-Jae', 'F')) AS t(name, gender) WHERE a.name=t.name AND a.gender=t.gender; :: code name gender nation_code event ===================================================================================================== 21111 'Jang Mi-Ran' 'F' 'KOR' 'Weight-lifting' 21112 'Son Yeon-Jae' 'F' 'KOR' 'Rhythmic gymnastics' FOR UPDATE ========== The **FOR UPDATE** clause can be used in **SELECT** statements for locking rows returned by the statement for a later **UPDATE/DELETE**. :: SELECT ... [FOR UPDATE [OF ]] ::= [, , ... ] ::= table_name | view_name * <*spec_name_comma_list*>: A list of table/view names referenced from the **FROM** clause. Only table/view referenced in <*spec_name_comma_list*> will be locked. If the <*spec_name_comma_list*> is missing but **FOR UPDATE** is present then we assume that all tables/views from the **FROM** clause of the **SELECT** statement are referenced. Rows are locked using **X_LOCK**. .. note:: Restrictions * It cannot be used in subqueries (but it can reference subqueries). * It cannot be used in a statement that has **GROUP BY**, **DISTINCT** or aggregate functions. * It cannot reference **UNION**\s. The following shows how to use **SELECT ... FOR UPDATE** statements. .. code-block:: sql CREATE TABLE t1(i INT); INSERT INTO t1 VALUES (1), (2), (3), (4), (5); CREATE TABLE t2(i INT); INSERT INTO t2 VALUES (1), (2), (3), (4), (5); CREATE INDEX idx_t2_i ON t2(i); CREATE VIEW v12 AS SELECT t1.i AS i1, t2.i AS i2 FROM t1 INNER JOIN t2 ON t1.i=t2.i; SELECT * FROM t1 ORDER BY 1 FOR UPDATE; SELECT * FROM t1 ORDER BY 1 FOR UPDATE OF t1; SELECT * FROM t1 INNER JOIN t2 ON t1.i=t2.i ORDER BY 1 FOR UPDATE OF t1, t2; SELECT * FROM t1 INNER JOIN (SELECT * FROM t2 WHERE t2.i > 0) r ON t1.i=r.i WHERE t1.i > 0 ORDER BY 1 FOR UPDATE; SELECT * FROM v12 ORDER BY 1 FOR UPDATE; SELECT * FROM t1, (SELECT * FROM v12, t2 WHERE t2.i > 0 AND t2.i=v12.i1) r WHERE t1.i > 0 AND t1.i=r.i ORDER BY 1 FOR UPDATE OF r;