Read our post and find out more details about the topic. See more details about how Data Architecture works and what its main fundamentals are!
What is Data Architecture?
Data Architecture is an area of Data Science that works with specifications for describing the current state of data, defines its requirements, works with integration- oriented data and controls assets according to a strategy .
How does Data Architecture work?
Data Architecture involves techniques and strategies for managing the data lifecycles in an enterprise — which serve to drive operational processes and decision-making. These strategies involve:
- selection : focused on identifying which datasets are generated within the company and which are generated outside the company;
- data infrastructure : analyzes and chooses tools/data platforms and associated data management services, implements systems in data centers and cloud and network configuration;
- onboarding and data integration : involves assimilating data from outside sources, validating it against defined data quality factors, transforming it into usable formats, integrating it with data frominternal business apps ;
- data storage : considers the use of relational database management systems for structured data, comma-separated values and text files, unstructured and semi-structured data managed in NoSQL databases, cloud object storage services and structures of Big Data ;
- data usage : considers the diverse communities that consume data, analyzes the requirements, supports your usage scenarios;
- data access concentration on access methods : direct query, data services, data extraction;
- data analysis and display : involve methods for organizing data in order to generate reports and analysis;
- data protection : involves care in perimeter security, encryption techniques and access controls based on attributes and functions;
- data governance : oversees compliance with rules, models and policies that govern the collection, management and use of corporate data.
What are the roles of a data architect?
A data architect has a set of roles related to the objective of the data strategy . Responsibilities involve on-premise platforms and cloud data and application services . The roles of a data architect are:
- description of parameters and principles that govern in data environments such as cloud computing and hybrid places;
- definition of the management structures that will be used ( Data Warehouse , Data Lake , Data Mesh ) and the end user data query and visualization tools;
- consideration of operational demands, performance and expense expectations, planning a strategy for managing data and apps, particularly in the cloud;
- development of a catalog that lists enterprise data assets along with their characteristics, defines where assets are located and access controls, and classifies data sensitivity;
- supervising the use of modeling tools and techniques, guiding modelers in building models, supervising the processes that model data, as well as maintaining a set of metadata that capture intelligence from business data;
- overseeing the selection and implementation of data management tools that align with development processes and methods;
- development and maintenance of reference architecture, which involves specifying the data domain used in diverse applications and business lines;
- documenting how data travels from points of origin and acquisition into applications and systems , and overseeing the development, management , and monitoring of data pipelines ;
- description of techniques and processes for integrating data and choosing tools for implementing and overseeing integration efforts;
- specification of data access methods and architecture data services to support self-service downstream access for data scientists and analysts;
- documentation of data quality rules and expectations and the selection and implementation of tools to manage and report compliance with data quality requirements;
- defining policies that protect data and choosing the right technologies to implement policies;
- monitoring, auditing and reporting compliance with internal data parameters, externally determined regulations and policies, and performance expectations.
What are the main models of Data Architecture?
The database approach uses a common data model for the entire Database . The DBMS (Database Management System) plays the role of interface between the user software and the database. Conceptually, there are three Data Architecture models. Look!
It is a model that provides users with data in a hierarchy of elements that are represented in a sort of inverted tree . For example, when processing sales orders, many invoices may appear for a customer, each of which may have different data elements. In this case:
- the customer is the root level of data;
- invoice is the second level;
- line items (date, quantity, product, invoice number) are the third and final level.
Although it is an apparently simple structure, it has certain disadvantages, such as the difficulty in identifying which products a particular customer has purchased.
In this model, there are no levels of logging. There is no defined path to retrieve data, and there are a high number of links, making network databases slow and complex.
Due to its complexity, this model is only applied when other options are not possible . An example of a network model is the employee and the sector in which he worked or will work in the future.
It is the most current model of Data Architecture. It was created to avoid the inflexibility and complexity of the models already described .
The relational model is simple and powerful. Each file is treated as a two-dimensional table that has many records (rows), each of which contains key items. Key items are data elements that identify the record.
In the above example of order processing, the most important data items are the customer ID, product code, and invoice number. Each file can be used individually in report generation. On the other hand, it is possible to get the data from any combination of files, since all the files are related to each other — hence the name “relational model”.
How to structure a Data Architecture?
There are different enterprise architecture structures that tend to work as a foundation for building a company’s Data Architecture structure. Follow!
DAMA International ‘s Data Management Body of Knowledge is a framework for managing data. It offers standard definitions for deliverables, data management, roles and more terminology , and presents guiding principles for managing data.
Zackman Framework for Enterprise Architecture
The Zackman Framework was developed by John Zachman in the 1980s at International Business Machines (IBM). It is a corporate ontology, which, in Information Technology, is a data model related to a series of concepts that make up a domain and the relationships between them .
Ontologies are used in Semantic Web, Artificial Intelligence, Information Architecture and Software Engineering to represent knowledge about the world or some part of it.
There is a column called “Data” in Zackman Framework that provides important architecture patterns, a semantic model (business/conceptual data model), business/logical data model, real databases and physical data model.
TOGAF ( Open Group Architecture Framework ) is an enterprise architecture method that provides a high-level framework for developing software for companies . In phase C, implementation of a Data Architecture and building a roadmap for it is considered.
What are the 8 principles of Data Architecture?
The Data Architecture works based on eight principles: security, collaboration, intelligence, common vocabularies, flexibility, automation, curation and results orientation. Let’s talk about each of these pillars, which are key to structuring a good foundation.
According to the principles of data governance , the system must guarantee security so that the information in a company is only accessible to authorized employees .
Thus, the Data Architecture needs to follow procedures that ensure protection against intrusion or improper access, giving access to the most important data only to those who have credentials for that.
The horizontal management model becomes an increasingly consolidated trend. In this way, every company requires solutions that enable the management, access and processing of data by different employees and teams.
Data is an asset that needs sharing. It is important to reduce departmental data pools, ensuring that everyone involved gets a systemic view of the entire organization .
In addition to breaking departmental silos, modern Data Architecture values providing interfaces that make users consume data more easily, using tools that are more appropriate for their activities.
There are already smart buildings, as everyone knows. Likewise, these are the expectations regarding the systems with which companies work their strategic information.
Intelligence is the pillar that encompasses the concept of BI ( Business Intelligence ). Through it, tasks are based on decisions based on structured data. Thus, it is the role of data architects to ensure that the organization has at its disposal more than raw data, but also valuable information, whenever it is needed.
4. Common vocabularies
Data assets that are shared need a common vocabulary, which helps to resolve discussions as they are analyzed . These assets, among others, are fiscal calendar dimensions, product catalogs, and indicator definitions (KPIs).
Despite all the security, a well-protected system is not rigid, that is, it can be molded according to needs. In the digital transformation, it is necessary that the Data Architecture is flexible, allowing the system to be scalable, to develop as the company grows.
There are situations, for example, in which it is necessary to verify new authorizations or unplanned accesses. Thus, the more it is possible to anticipate the demands, the better it will be.
Automated processes are a basic feature of digital tools. Based on this pillar, Data Architecture is committed to proposing effective solutions with better levels of automation .
Automation provides optimization of data streams, reducing the number of times they need to be moved. Thus, it will be possible to reduce costs, improve updating and business agility .
It is also important to carry out data curation, a process that consists of modeling relevant relationships, curating dimensions, cleaning raw data and key measures.
8. Results orientation
Theoretically, Data Architecture should be the ideal solution for all problems related to the use of technology. But it will only be effectively useful if it is guided by well-defined corporate goals.