Search through blog..

Friday, December 3, 2021

Azure Data Lake : 101

We are living in a Digital world where Data is everything and the Ability to process and generate insights to enable Business decision making is the absolute super power you want to have.

And in order to process data into meaningful information, it might be a good practice to have a place to store all kinds of data - and Microsoft provides one such storage service with Azure Data lake

So What is Azure Data Lake Storage? 

Azure Data Lake Storage (ADLS) can be literally compared to a large lake/pond, where rain water passing through various terrains gets collected. Irrespective of whether a water stream passing through fields is muddy (or) a water stream passing through a cluster of rocks is clean - a lake would take in the water as it comes. 

Just like that ADLS can be considered as a repository that has capacity to hold large amounts of data in their native, raw format. 

Data lake storages can be terabytes and petabytes in size. Data can come from multiple heterogeneous sources (different in nature). Structured data, Semi-structure data or Unstructured data - all can be stored in a Data lake in their original and un-transformed state. 

Advantages of a Data lake: 

  1. Faster than traditional ETL tools 
  2. Data is never thrown away
  3. Users have possibility to query and explore data
  4. More flexible than a traditional Data warehouse, as there is no demand to ingest only structured data
Then What is Azure Data Lake Storage Gen2? 

ADLS Gen2 converges the capabilities of ADLS Gen1 with Azure Blob storage. So basically has a top up - provides file system semantics, file-level security available, and better scalability. 

All these additional capabilites for ADLS Gen2 are built on Azure Blob storage - thereby supports low-cost data storage, tiered storage and with higher availability (Blog storage disaster recovery capabilities are inherited)

Lot more details on Data lake can be found in the links shared below.

Also it is important to understand that, 

  • Data Lake is usually the first stop in the data flow. So further processing of the raw data needs to be done utilizing big data technologies. 
  • The raw data dump into data lake comes with a responsibility to include governance and need to ensure quality of meta data. Data discovery and analytics capabilities should be developed in order to make proper use of data stored in data lake

Azure Data Lake add-in for Microsoft Dynamics 365 Finance and Operations is also now generally available depending on where you are placed on our planet. This add-in basically helps push data out from D365FO out into Data Lake based on the configuration and setup. Installation details can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/configure-export-data-lake

And once installed, further details to be setup can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/finance-data-azure-data-lake

Important to understand that this is a fairly new feature. Microsoft promises big on this approach going forward, so we can anticipate enhancements in the near future. More details and better overview can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/azure-data-lake-ga-version-overview


Microsoft has detailed information on Docs regarding data lakes - https://docs.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake

Read more on ADLS Gen1 in https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview

Details on ADLS Gen2 in https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction

If you would like to try our setting up Azure Data Lake yourself, Try this blog - it helped me setup mine - https://allaboutdynamic.com/2020/07/09/entity-store-in-azure-data-lake-d365-finance-operations/amp/

If you are in a location where you don't have the feature enabled (or) if you would like to understand the details on what does the add-in do. Try and explore in this Github link - https://github.com/microsoft/Dynamics-365-FastTrack-Implementation-Assets/tree/master/Analytics/AzureDataFactoryARMTemplates/SQLToADLSFullExport

However you are at the very beginning of understanding all these concepts, I would always recommend you to go through MS Learn - https://docs.microsoft.com/en-us/learn/modules/introduction-to-azure-data-lake-storage/ 

Happy exploring. Good luck 😄


Wednesday, December 1, 2021

Azure Data Factory : 101

If you have ever worked with a Data warehousing solution, you would probably say that the most important part of the job is to ensure proper Data ingestion (Data loading). If you lose any data at this point, then the resulting information (reports) will end up inaccurate, failing to represent the facts on which Business decisions are made.

Microsoft Azure provides several services which you can use to ingest data and one of them is Azure Data Factory.

So What is Azure Data Factory? 

Azure Data Factory (ADF) is a Platform-as-a-Service offering from Microsoft. The primary purpose of this service could be to do Extract, Transform and Load (ETL) or Extract, Load and Transform (ELT) and this is done via using a concept pipelines. Two types of pipelines to begin with - data movement pipelines (Extract & Load) and also data transformation pipelines (Transform). And being a PaaS service, ADF automatically scales out based on the demand enforced using these pipelines.

ADF is ideal for working with Structured data as well as Unstructured data. ADF allows you to load raw data from many different sources, both on-premises and in the cloud. 

Like many other products from Microsoft, I would call ADF as a collection of several tools packaged together, 

  1. For ease of understanding
  2. To eradicate unnecessary maintenance work
  3. To streamline the approach to be taken 

Microsoft has detailed information in Docs and you can probably start digging from https://docs.microsoft.com/en-us/azure/data-factory/introduction 

If you have the necessary details and would like to get started with Azure Data Factory already - you can start in https://azure.microsoft.com/en-us/services/data-factory/

An idea on the pricing details can be found in https://azure.microsoft.com/en-us/pricing/details/data-factory/data-pipeline/

And if you are a hands-on person, there is a GitHub lab tutorial with all needed details in https://github.com/kromerm/adflab 

If you would like to have a poster on your wall reminding of you all about Azure Data Factory - then feel free to go in https://aka.ms/visual/azure-data-factory

And probably good to know is that you can utilize upto  5 free low frequency activites with Azure Data Factory by signing up for a free Azure account. More details in https://azure.microsoft.com/en-us/free/free-account-faq/

And if you are into LEARN from Microsoft docs - I would recommend to go through https://docs.microsoft.com/en-us/learn/modules/explore-azure-synapse-analytics/ to get a better perspective on large scale data analytics