Search through blog..

Friday, December 3, 2021

Azure Data Lake : 101

We are living in a Digital world where Data is everything and the Ability to process and generate insights to enable Business decision making is the absolute super power you want to have.

And in order to process data into meaningful information, it might be a good practice to have a place to store all kinds of data - and Microsoft provides one such storage service with Azure Data lake

So What is Azure Data Lake Storage? 

Azure Data Lake Storage (ADLS) can be literally compared to a large lake/pond, where rain water passing through various terrains gets collected. Irrespective of whether a water stream passing through fields is muddy (or) a water stream passing through a cluster of rocks is clean - a lake would take in the water as it comes. 

Just like that ADLS can be considered as a repository that has capacity to hold large amounts of data in their native, raw format. 

Data lake storages can be terabytes and petabytes in size. Data can come from multiple heterogeneous sources (different in nature). Structured data, Semi-structure data or Unstructured data - all can be stored in a Data lake in their original and un-transformed state. 

Advantages of a Data lake: 

  1. Faster than traditional ETL tools 
  2. Data is never thrown away
  3. Users have possibility to query and explore data
  4. More flexible than a traditional Data warehouse, as there is no demand to ingest only structured data
Then What is Azure Data Lake Storage Gen2? 

ADLS Gen2 converges the capabilities of ADLS Gen1 with Azure Blob storage. So basically has a top up - provides file system semantics, file-level security available, and better scalability. 

All these additional capabilites for ADLS Gen2 are built on Azure Blob storage - thereby supports low-cost data storage, tiered storage and with higher availability (Blog storage disaster recovery capabilities are inherited)

Lot more details on Data lake can be found in the links shared below.

Also it is important to understand that, 

  • Data Lake is usually the first stop in the data flow. So further processing of the raw data needs to be done utilizing big data technologies. 
  • The raw data dump into data lake comes with a responsibility to include governance and need to ensure quality of meta data. Data discovery and analytics capabilities should be developed in order to make proper use of data stored in data lake

Azure Data Lake add-in for Microsoft Dynamics 365 Finance and Operations is also now generally available depending on where you are placed on our planet. This add-in basically helps push data out from D365FO out into Data Lake based on the configuration and setup. Installation details can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/configure-export-data-lake

And once installed, further details to be setup can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/finance-data-azure-data-lake

Important to understand that this is a fairly new feature. Microsoft promises big on this approach going forward, so we can anticipate enhancements in the near future. More details and better overview can be found in https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/azure-data-lake-ga-version-overview


Microsoft has detailed information on Docs regarding data lakes - https://docs.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake

Read more on ADLS Gen1 in https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview

Details on ADLS Gen2 in https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction

If you would like to try our setting up Azure Data Lake yourself, Try this blog - it helped me setup mine - https://allaboutdynamic.com/2020/07/09/entity-store-in-azure-data-lake-d365-finance-operations/amp/

If you are in a location where you don't have the feature enabled (or) if you would like to understand the details on what does the add-in do. Try and explore in this Github link - https://github.com/microsoft/Dynamics-365-FastTrack-Implementation-Assets/tree/master/Analytics/AzureDataFactoryARMTemplates/SQLToADLSFullExport

However you are at the very beginning of understanding all these concepts, I would always recommend you to go through MS Learn - https://docs.microsoft.com/en-us/learn/modules/introduction-to-azure-data-lake-storage/ 

Happy exploring. Good luck 😄


No comments: