Extremely Serious

Category: Database

Understanding the Fundamental Categories of Enterprise Data

In the world of data management, enterprises deal with diverse types of information crucial for their operations. Three fundamental categories play a pivotal role in organizing and utilizing this wealth of data: Master Data, Transaction Data, and Reference Data.

Master Data

Master data represents the core business entities that are shared across an organization. This includes but is not limited to:

  • Customer Information: Details about customers, their profiles, and interactions.
  • Product Data: Comprehensive information about products or services offered.
  • Employee Records: Data related to employees, their roles, and responsibilities.

Master data serves as a foundational element, providing a consistent and accurate view of key entities, fostering effective decision-making and streamlined business processes.

Transaction Data

Transaction data captures the day-to-day operations of an organization. It includes records of individual business activities and interactions, such as:

  • Sales Orders: Information about customer purchases and sales transactions.
  • Invoices: Documentation of financial transactions between the business and its clients.
  • Payment Records: Details of payments made or received.

Transaction data is dynamic, changing with each business activity, and is crucial for real-time monitoring and analysis of operational performance.

Reference Data

Reference data is static information used to categorize other data. It provides a standardized framework for classifying and organizing data. Examples include:

  • Country Codes: Standardized codes for different countries.
  • Product Classifications: Codes or categories for organizing products.
  • Business Units: Classifications for different business segments.

Reference data ensures consistency in data interpretation across the organization, facilitating interoperability and accurate reporting.

Beyond the Basics

While Master Data, Transaction Data, and Reference Data form the bedrock of enterprise data management, the landscape can be more nuanced. Additional types of data may include:

  • Metadata: Information that describes the characteristics of other data, providing context and facilitating understanding.
  • Historical Data: Records of past transactions and events, essential for trend analysis and forecasting.
  • Analytical Data: Information used for business intelligence and decision support.

Understanding the intricacies of these data categories empowers organizations to implement robust data management strategies, fostering efficiency, accuracy, and agility in an increasingly data-driven world.

In conclusion, mastering the distinctions between Master Data, Transaction Data, and Reference Data is essential for organizations aiming to harness the full potential of their information assets. By strategically managing these categories, businesses can lay the foundation for informed decision-making, operational excellence, and sustained growth.

Understanding Database Cardinality Relationships

In the realm of relational databases, cardinality relationships define the connections between tables and govern how instances of one entity relate to instances of another. Let's delve into three cardinality relationships with a consistent example, illustrating each with table declarations.

1. One-to-One (1:1) Relationship

In a one-to-one relationship, each record in the first table corresponds to exactly one record in the second table, and vice versa. Consider the relationship between Students and DormRooms:

CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(50),
    dorm_room_id INT UNIQUE,
    FOREIGN KEY (dorm_room_id) REFERENCES DormRooms(dorm_room_id)
);

CREATE TABLE DormRooms (
    dorm_room_id INT PRIMARY KEY,
    room_number INT
);

Here, each student is assigned one dorm room, and each dorm room is assigned to one student.

2. One-to-Many (1:N) Relationship

In a one-to-many relationship, each record in the first table can be associated with multiple records in the second table, but each record in the second table is associated with only one record in the first table. Consider the relationship between Departments and Professors:

CREATE TABLE Departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(50)
);

CREATE TABLE Professors (
    professor_id INT PRIMARY KEY,
    professor_name VARCHAR(50),
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES Departments(department_id)
);

In this case, each department can have multiple professors, but each professor is associated with only one department.

3. Many-to-Many (N:N) Relationship

In a many-to-many relationship, multiple records in the first table can be associated with multiple records in the second table, and vice versa. Consider the relationship between Students and Courses:

CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(50)
);

CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(50)
);

CREATE TABLE StudentCourses (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

In this scenario, many students can enroll in multiple courses, and each course can have multiple students.

Understanding these cardinality relationships is essential for designing robust and efficient relational databases, ensuring the integrity and consistency of data across tables.

Common Table Expression (CTE) – With Clause

The with clause is also known as common table expression (CTE) and subquery refactory. It is a temporary named result set.

SQL:1999 added the with clause to define "statement scoped views". They are not stored in the database scheme: instead, they are only valid in the query they belong to. This makes it possible to improve the structure of a statement without polluting the global namespace.

Syntax

with <QUERY_NAME_1> (<COLUMN_1>[, <COLUMN_2>][, <COLUMN_N>]) as
     (<INNER_SELECT_STATEMENT>)
[,<QUERY_NAME_2> (<COLUMN_1>[, <COLUMN_2>][, <COLUMN_N>]) as
     (<INNER_SELECT_STATEMENT>)]
<SELECT_STATEMENT>

Non-Recursive Example

with sales_tbl as (
select sales.*
	from (VALUES
		('Spiderman',1,19750),
		('Batman',1,19746),
		('Superman',1,9227),
		('Iron Man',1,9227),
		('Wonder Woman',2,16243),
		('Kikkoman',2,17233),
		('Cat Woman',2,8308),
		('Ant Man',3,19427),
		('Aquaman',3,16369),
		('Iceman',3,9309)
	) sales (emp_name,dealer_id,sales)
)
select ROW_NUMBER() over (order by dealer_id) as rownumber, *
from sales_tbl

Recursive Example

WITH [counter] AS (

   SELECT 1 AS n  -- Executes first and only once.

   UNION ALL      -- UNION ALL must be used.

   SELECT n + 1   -- The portion that will be executed 
   FROM [counter] -- repeatedly until there's no row 
                  -- to return.

   WHERE  n < 50  -- Ensures that the query stops.
)
SELECT n FROM [counter]

SQL Window Functions

Window functions are closely related to aggregate functions except that it retains all the rows.

Categories

Aggregate Window Functions

Ranking Window Functions

Value Window Functions

Example

with sales_tbl as (
select sales.*
	from (VALUES
		('Spiderman',1,19750),
		('Batman',1,19746),
		('Superman',1,9227),
		('Iron Man',1,9227),
		('Wonder Woman',2,16243),
		('Kikkoman',2,17233),
		('Cat Woman',2,8308),
		('Ant Man',3,19427),
		('Aquaman',3,16369),
		('Iceman',3,9309)
	) sales (emp_name,dealer_id,sales)
)
select ROW_NUMBER() over (order by dealer_id) as rownumber,
	*,
	AVG(sales) over (partition by dealer_id) as [Average Sales by DealerID],
	SUM(sales) over (partition by dealer_id) as [Total Sales by DealerID],
	SUM(sales) over (partition by dealer_id order by sales rows between unbounded preceding and current row) as [Running Total by DealerID]
from sales_tbl

Value Window Functions

LAG()

Accesses data from a previous row in the same result set without the use of a self-join starting with SQL Server 2012 (11.x). LAG provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.

LEAD()

Accesses data from a subsequent row in the same result set without the use of a self-join starting with SQL Server 2012 (11.x). LEAD provides access to a row at a given physical offset that follows the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a following row.

Common Syntax

LAG | LEAD
( expression )
OVER ( [ PARTITION BY expr_list ] [ ORDER BY order_list ] )

FIRST_VALUE()

Returns the first value in an ordered set of values in SQL Server 2019

LAST_VALUE()

Returns the last value in an ordered set of values in SQL Server 2019 (15.x).

Common Syntax

FIRST_VALUE | LAST_VALUE
( expression ) OVER
( [ PARTITION BY expr_list ] [ ORDER BY order_list ][ frame_clause ] )

Ranking Window Functions

CUME_DIST()

For SQL Server, this function calculates the cumulative distribution of a value within a group of values. In other words, CUME_DIST calculates the relative position of a specified value in a group of values. Assuming ascending ordering, the CUME_DIST of a value in row r is defined as the number of rows with values less than or equal to that value in row r, divided by the number of rows evaluated in the partition or query result set

DENSE_RANK()

This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.

NTILE()

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

PERCENT_RANK()

Calculates the relative rank of a row within a group of rows in SQL Server 2019 (15.x). Use PERCENT_RANK to evaluate the relative standing of a value within a query result set or partition. PERCENT_RANK is similar to the CUME_DIST function.

RANK()

Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.

ROW_NUMBER()

Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

Common Syntax

window_function () OVER clause

Aggregate Window Functions

AVG()

This function returns the average of the values in a group

COUNT()

This function returns the number of items found in a group.

MAX()

Returns the maximum value in the expression.

MIN()

Returns the minimum value in the expression

STDEV()

Returns the statistical standard deviation of all values in the specified expression.

STDEVP()

Returns the statistical standard deviation for the population for all values in the specified expression.

SUM()

Returns the sum of all the values, or only the DISTINCT values, in the expression.

VAR()

Returns the statistical variance of all values in the specified expression

VARP()

Returns the statistical variance for the population for all values in the specified expression.

Common Syntax

window_function ( [ ALL ] expression ) 
OVER ( [ PARTITION BY expr_list ] [ ORDER BY order_list frame_clause ] )

Using Embedded Derby in Java 9 with Gradle

  1. Add the following dependencies to your build.gradle file:
    compile group: 'org.apache.derby', name: 'derby', version: '10.15.1.3'
    compile group: 'org.apache.derby', name: 'derbyshared', version: '10.15.1.3'
  2. Add the following entries to your module-info.java file:
    requires org.apache.derby.engine;
    requires org.apache.derby.commons;
    requires java.sql;
    
  3. In your Java class, you can create a connection like the following:
    final String DATABASE_NAME = "sample_table";
    String connectionURL = String.format("jdbc:derby:%s;create=true", DATABASE_NAME);
    connection = DriverManager.getConnection(connectionURL);
    
  4. Do your needed database operations with the connection (e.g. create table, execute a query, or call a stored procedure.).
  5. When your done using the connection, close the connection and shutdown the Derby engine like the following:
    connection.close();
    
    boolean gotSQLExc = false;
    try {
        //shutdown all databases and the Derby engine
        DriverManager.getConnection("jdbc:derby:;shutdown=true");
    } catch (SQLException se)  {
        if ( se.getSQLState().equals("XJ015") ) {
            gotSQLExc = true;
        }
    }
    if (!gotSQLExc) {
        System.out.println("Database did not shut down normally");
    }

    A clean shutdown always throws SQL exception XJ015, which can be ignored.

Using Values as the Record Source for Select Statement

Values can become a valid record source for select statement if the values has table alias and column names as follows:

SELECT dummy_table.* 
FROM (VALUES ('record1'),
	 ('record2'),
	 ('record3'),
	 ('record4'),
	 ('record5'),
	 ('record6'),
	 ('record7'),
	 ('record8'),
	 ('record9')) dummy_table (column1)

From the preceding select statement, the dummy_table is the table alias and the column name is column1.

Resetting sa password in SQL Server Express

  1. Open the Services application and look for SQL Server (SQLEXPRESS).
  2. Stop the SQL Server (SQLExpress) service.
  3. Right-click the SQL Server (SQLExpress) service and select Properties.
  4. At the Start parameters field type in -m.
  5. Click the Start button.
  6. Close the Services application.
  7. Open a Command Prompt in elevated mode.
  8. Execute the following command:
    osql -S <LOCAL_PC_NAME>\SQLEXPRESS -E

    Note: The <LOCAL_PC_NAME> must be changed appropriately.

  9. At the prompt generated by the previous command invoke the following commands in sequence:
    sp_password NULL,'<NEW_PASSWORD>','sa'

    Note: Change <NEW_PASSWORD> to your desired password.

    go
    quit
  10. Close the  elevated Command Prompt.
  11. Repeat steps 1 to 6, but on step 4 remove -m  from the Start parameters field, if it exists.