SQL Server Architecture: An In-Depth Analysis
Microsoft SQL Server follows a client-server database architecture, where the SQL Server process initiates when a client sends a request to the server. The SQL Server then processes the request and, finally, responds to the client with the results of the processed data or any relevant information.
There are three main components of MS SQL Server, and let's explain the
each component in detail.
1. SQL Server Network Interface (Protocol Layer)
2. Relational Engine
3. Storage Engine
1. SQL Server Network Interface (SNI) Layer
It is referred to as the
SQL Server Network Interface, and it contains SQL Server protocols. The
following are the primary protocols associated with SQL Server:
a.
Shared
memory
b.
Named
pipes
c.
TCP/IP
d.
TDS
(Tabular Data Stream)
a. Shared Memory
Shared
memory is the simplest protocol and is used by default for local connections
when the client application and the SQL Server are located on the same machine.
b. Named Pipes
Named
pipes is a protocol that can be used if the client application and the SQL
Server are connected via a local area network (LAN). By default, this protocol
is disabled and default port is 445 for named pipes.
TCP/IP
is the primary protocol used to connect to SQL Server from a remote machine
using an IP address and a port number. The default TCP port for SQL Server is
1433. This protocol is required when the client application and the SQL Server
are on separate machines.
TDS,
or Tabular Data Stream, is a crucial component in the communication process
between client and server in SQL Server environments. All three
protocols—Shared Memory, Named Pipes, and TCP/IP—utilize TDS packets, which are
encapsulated within network packets. This encapsulation facilitates the
seamless transfer of data from the client machine to the server machine.
2. Relational Engine
The Relational Engine, also referred to as
the Query Processor, is responsible for handling the execution of queries as
they are received from client applications.
There are three major components of the
relational engine.
b. Optimizer
c. Query Executor
The CMD Parser is the first component of Relational
Engine which known as Query Parser. It receives the query data that sent from the protocol layer. The main task
of CMD Parser is to check syntactic and semantic errors and finally generate a
query tree.
Syntactic Error: Each
database has its own set of keywords and grammar rules. When a query is
received by the CMD parser, it first checks the keywords and grammar rules. If
the received query doesn't follow to the language syntax or grammar rules, it
returns an error.
Semantic Errors: The
semantic check analyzes whether the queried table and column exist in the
database schema. If they exist, the process of binding associates them with the
query. This association is known as binding. The CMS Parser throws an error
message if the queried table and column do not exist.
b. Optimizer
The work of the optimizer is to create an
execution plan for the user’s query. This is the plan that will determine how
the user query will be executed. Not all query need to optimize only DML
commands such as SELECT, INSERT, UPDATE, and DELETE that are marked and sent to
the Optimizer. The calculation of their cost is based on CPU and memory usage,
as well as input/output needs.
Phase 0: Trivial Plan
Initially, the Optimizer checks
for a straightforward plan. If there is none available, it moves on to the
first phase.
Phase 1: Transaction Processing Plan
In this step, the goal is to find
the best transaction processing plan. Initially, a straightforward plan with
one index per table is considered. If such a plan isn't sufficient, the
Optimizer looks for a more complex plan involving multiple indexes per table.
Phase 2: Parallel Processing & Optimization
If the Optimizer can't speed up
the query using the mentioned methods, it turns to parallel processing,
considering the capabilities and setup of the user's machine. If that's not
feasible, the final optimization kicks in, aiming to explore any other
potential options to execute the query in the most cost-effective way.
Finally, the Query Executor calls the access method,
initiating the retrieval of necessary data from the storage engine. Once this
data is obtained, it is sent back to the protocol layer. Ultimately, the data
is transmitted to the end user.
3. Storage Engine
The Storage Engine's primary task is to manage the storage and retrieval
of data from storage systems such as disks or SANs. It includes a buffer
manager and a transaction manager, both of which collaborate to interact with
data and log files based on the requirements of the query.
Data File Types
SQL Server stores its data, including user and system tables, indexes, and
SQL code such as stored procedures, views, and functions, in various types of
data files. These data files physically store data in pages, and each page has
a size of 8KB, which is the smallest data storage unit in SQL Server.
There are three data files.
a. Primary file
i. Every database contains one Primary file.
ii. This store all important data related to tables, views, Triggers, etc.
iii. Extension is .mdf usually but can be of any extension.
Ø b. Secondary file
ii. This is optional and contain user-specific data.
iii. Extension is .ndf usually but can be of any extension.
c. Log file
i. Extension is .ldfii. Used for Transaction Management.
Access Method
It acts as an interface between the query executor and the Buffer
Manager/Transaction Logs, determining whether the query is a SELECT or a
non-SELECT statement.
b. And if query if NON-SELECT statement, the query is pass to Transaction Manager. This mostly includes the UPDATE statement.
Buffer Manager
The core functions of buffer manager are given
below:
a. Plan Cache
b. Data Parsing: Buffer cache & Data storage
c. Dirty Page
a. Plan
Cache
The Manager first checks if there is an existing execution plan in the Plan Cache. If it finds one, it uses that plan along with its related data. However, for the first-time execution of a complex query, it's efficient to store the plan in the Plan Cache. This way, the SQL server can quickly access it the next time the same query comes in, ensuring faster availability.
b. Data
Parsing: Buffer cache & Data storage
The Buffer Manager allows access to the needed data. If there is already an execution plan in the Plan Cache, it is a soft parsing scenario. The data is available in the Data Cache and utilized by the Query Executor. This approach offers improved performance due to fewer I/O operations. However, in the absence of an execution plan in the Plan Cache, we encounter hard parsing. In this case, data has to be fetched from the storage.
c. Dirty
Page
It is saved as a
processing logic of Transaction Manager. More details about this will be
covered in the Transaction Manager section.
Transaction Manager
The Transaction Manager is invoked when the query comprises a non-SELECT statement.
Log Manager
- Log Manager keeps a track of all updates done in the system via logs in Transaction Logs.
- Logs have Logs Sequence Number with the Transaction ID and Data Modification Record.
- This is used for keeping track of Transaction Committed and Transaction Rollback.
Lock Manager
- During Transaction, the associated data in Data Storage is in the Lock state. This process is handled by Lock Manager.
- This process ensures data consistency and isolation. Also known as ACID properties.
Execution Process
- Log Manager Start logging and Lock
Manager locks the associated data.
- Data’s copy is maintained in the
Buffer cache.
- Copy of data supposed to be
updated is maintained in Log buffer and all the events updates data in
Data buffer.
- Pages which store the data is also
known as Dirty Pages.
- Checkpoint and Write-Ahead Logging
involve running a process that marks all dirty pages to be written to the
disk, although the pages still stay in the cache. This process occurs
roughly once per minute. However, before reaching the data page on the log
file, the page is initially sent from the Buffer log. This mechanism is
commonly known as Write-Ahead Logging.
- Lazy Writer handles situations where dirty pages can stay in memory. When the SQL server detects a significant load and requires buffer memory for a new transaction, it releases dirty pages from the cache. It follows the LRU (Least Recently Used) algorithm to clean pages from the buffer pool to disk.
- Log Manager keeps a track of all updates done in the system via logs in Transaction Logs.
- Logs have Logs Sequence Number with the Transaction ID and Data Modification Record.
- This is used for keeping track of Transaction Committed and Transaction Rollback.
Post a Comment
0 Comments