DP-203 Data Engineering on Microsoft Azure Dumps

If you are looking for free DP-203 dumps than here we have some sample question answers available. You can prepare from our Microsoft DP-203 exam questions notes and prepare exam with this practice test. Check below our updated DP-203 exam dumps.

DumpsGroup are top class study material providers and our inclusive range of DP-203 Real exam questions would be your key to success in Microsoft Azure Data Engineer Associate Certification Exam in just first attempt. We have an excellent material covering almost all the topics of Microsoft DP-203 exam. You can get this material in Microsoft DP-203 PDF and DP-203 practice test engine formats designed similar to the Real Exam Questions. Free DP-203 questions answers and free Microsoft DP-203 study material is available here to get an idea about the quality and accuracy of our study material.


discount banner

Sample Question 4

Note: This question is part of a series of questions that present the same scenario.Each question in the series contains a unique solution that might meet the statedgoals. Some question sets might have more than one correct solution, while othersmight not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As aresult, these questions will not appear in the review screen.You have an Azure Data Lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone,transform the data by executing an R script, and then insert the transformed data into adata warehouse in Azure Synapse Analytics.Solution: You schedule an Azure Databricks job that executes an R notebook, and theninserts the data into the data warehouse.Does this meet the goal?

A. Yes
B. No


Sample Question 5

You plan to use an Apache Spark pool in Azure Synapse Analytics to load data to an AzureData Lake Storage Gen2 account.You need to recommend which file format to use to store the data in the Data Lake Storageaccount. The solution must meet the following requirements:• Column names and data types must be defined within the files loaded to the Data LakeStorage account.• Data must be accessible by using queries from an Azure Synapse Analytics serverlessSQL pool.• Partition elimination must be supported without having to specify a specific partition.What should you recommend?

A. Delta Lake
B. JSON
C. CSV
D. ORC


Sample Question 6

You are designing 2 solution that will use tables in Delta Lake on Azure Databricks.You need to minimize how long it takes to perform the following:*Queries against non-partitioned tables* Joins on non-partitioned columnsWhich two options should you include in the solution? Each correct answer presents part ofthe solution.(Choose Correct Answer and Give Explanation and References to Support the answersbased from Data Engineering on Microsoft Azure)

A. Z-Ordering
B. Apache Spark caching
C. dynamic file pruning (DFP)
D. the clone command


Sample Question 7

Note: This question is part of a series of questions that present the same scenario.Each question in the series contains a unique solution that might meet the statedgoals. Some question sets might have more than one correct solution, while othersmight not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As aresult, these questions will not appear in the review screen.You are designing an Azure Stream Analytics solution that will analyze Twitter data.You need to count the tweets in each 10-second window. The solution must ensure thateach tweet is counted only once.Solution: You use a tumbling window, and you set the window size to 10 seconds.Does this meet the goal?

A. Yes
B. No


Sample Question 8

You have an Azure subscription that contains an Azure Blob Storage account namedstorage1 and an Azure Synapse Analytics dedicated SQL pool named Pool1.You need to store data in storage1. The data will be read by Pool1. The solution must meetthe following requirements:Enable Pool1 to skip columns and rows that are unnecessary in a query.Automatically create column statistics.Minimize the size of files.Which type of file should you use?

A. JSON
B. Parquet
C. Avro
D. CSV


Sample Question 9

You have an Azure Databricks workspace that contains a Delta Lake dimension tablenamed Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need toapply updates from a source table to Table1. Which Apache Spark SQL operation shouldyou use?

A. CREATE
B. UPDATE
C. MERGE
D. ALTER


Sample Question 10

You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains atable named table1.You load 5 TB of data intotable1.You need to ensure that columnstore compression is maximized for table1.Which statement should you execute?

A. ALTER INDEX ALL on table1 REORGANIZE
B. ALTER INDEX ALL on table1 REBUILD
C. DBCC DBREINOEX (table1)
D. DBCC INDEXDEFRAG (pool1,tablel)


Sample Question 11

You have two Azure Blob Storage accounts named account1 and account2?You plan to create an Azure Data Factory pipeline that will use scheduled intervals toreplicate newly created or modified blobs from account1 to account?You need to recommend a solution to implement the pipeline. The solution must meet thefollowing requirements:• Ensure that the pipeline only copies blobs that were created of modified since the mostrecent replication event.• Minimize the effort to create the pipeline. What should you recommend?

A. Create a pipeline that contains a flowlet.
B. Create a pipeline that contains a Data Flow activity.
C. Run the Copy Data tool and select Metadata-driven copy task.
D. Run the Copy Data tool and select Built-in copy task.


Sample Question 12

You have an Azure Data Factory pipeline named pipeline1 that is invoked by a tumblingwindow trigger named Trigger1. Trigger1 has a recurrence of 60 minutes.You need to ensure that pipeline1 will execute only if the previous execution completessuccessfully.How should you configure the self-dependency for Trigger1?

A. offset: "-00:01:00" size: "00:01:00"
B. offset: "01:00:00" size: "-01:00:00"
C. offset: "01:00:00" size: "01:00:00"
D. offset: "-01:00:00" size: "01:00:00"


Sample Question 13

You are building a data flow in Azure Data Factory that upserts data into a table in anAzure Synapse Analytics dedicated SQL pool.You need to add a transformation to the data flow. The transformation must specify logicindicating when a row from the input data must be upserted into the sink.Which type of transformation should you add to the data flow?

A. join
B. select
C. surrogate key
D. alter row


Sample Question 14

You have an Azure Data lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone,transform the data by executing an R script, and then insert the transformed data into adata warehouse in Azure Synapse Analytics.Solution: You use an Azure Data Factory schedule trigger to execute a pipeline thatexecutes an Azure Databricks notebook, and then inserts the data into the datawarehouse.Dow this meet the goal?

A. Yes
B. No


Sample Question 15

You are designing an Azure Data Lake Storage solution that will transform raw JSON filesfor use in an analytical workload.You need to recommend a format for the transformed files. The solution must meet thefollowing requirements:Contain information about the data types of each column in the files.Support querying a subset of columns in the files.Support read-heavy analytical workloads.Minimize the file size.What should you recommend?

A. JSON
B. CSV
C. Apache Avro
D. Apache Parquet


Sample Question 16

You have an Azure subscription that contains an Azure Synapse Analytics workspacenamed ws1 and an Azure Cosmos D6 database account named Cosmos1 Costmos1contains a container named container 1 and ws1 contains a serverless1 SQL pool. you need to ensure that you can Query the data in container by using the serverless1 SQLpool.Which three actions should you perform? Each correct answer presents part of the solutionNOTE: Each correct selection is worth one point.

A. Enable Azure Synapse Link for Cosmos1
B. Disable the analytical store for container1.
C. In ws1. create a linked service that references Cosmos1
D. Enable the analytical store for container1
E. Disable indexing for container1


Sample Question 17

You are designing a folder structure for the files m an Azure Data Lake Storage Gen2account. The account has one container that contains three years of data.You need to recommend a folder structure that meets the following requirements:• Supports partition elimination for queries by Azure Synapse Analytics serverless SQLpooh • Supports fast data retrieval for data from the current month• Simplifies data security management by departmentWhich folder structure should you recommend?

A. \YYY\MM\DD\Department\DataSource\DataFile_YYYMMMDD.parquet
B. \Depdftment\DataSource\YYY\MM\DataFile_YYYYMMDD.parquet
C. \DD\MM\YYYY\Department\DataSource\DataFile_DDMMYY.parquet
D. \DataSource\Department\YYYYMM\DataFile_YYYYMMDD.parquet


Sample Question 18

You have an Azure Synapse Analytics dedicated SQL pod. You need to create a pipeline that will execute a stored procedure in the dedicated SQLpool and use the returned result set as the input (or a downstream activity. The solutionmust minimize development effort.Which Type of activity should you use in the pipeline?

A. Notebook
B. U-SQL
C. Script
D. Stored Procedure


Sample Question 19

You have an Azure Synapse Analytics dedicated SQL pool that contains a table namedTable1. Table1 contains the following:One billion rowsA clustered columnstore index A hash-distributed column named Product KeyA column named Sales Date that is of the date data type and cannot be nullThirty million rows will be added to Table1 each month.You need to partition Table1 based on the Sales Date column. The solution must optimizequery performance and data loading.How often should you create a partition?

A. once per month
B. once per year
C. once per day
D. once per week


Sample Question 20

You have an Azure Databricks workspace named workspace! in the Standard pricing tier.Workspace1 contains an all-purpose cluster named cluster). You need to reduce the time ittakes for cluster 1 to start and scale up. The solution must minimize costs. What shouldyou do first?

A. Upgrade workspace! to the Premium pricing tier.
B. Create a cluster policy in workspace1.
C. Create a pool in workspace1.
D. Configure a global init script for workspace1.


Sample Question 21

What should you recommend to prevent users outside the Litware on-premises networkfrom accessing the analytical data store?

A. a server-level virtual network rule
B. a database-level virtual network rule
C. a database-level firewall IP rule
D. a server-level firewall IP rule


Sample Question 22

What should you recommend using to secure sensitive customer contact information?

A. data labels
B. column-level security
C. row-level security
D. Transparent Data Encryption (TDE)


Sample Question 23

You have an Azure subscription that contains an Azure Data Lake Storage account named myaccount1. The myaccount1 account contains two containers named container1 and contained. The subscription is linked to an Azure Active Directory (Azure AD) tenant that contains a security group named Group1. You need to grant Group1 read access to contamer1. The solution must use the principle of least privilege. Which role should you assign to Group1? 

A. Storage Blob Data Reader for container1 
B. Storage Table Data Reader for container1 
C. Storage Blob Data Reader for myaccount1 
D. Storage Table Data Reader for myaccount1 


Sample Question 24

You are designing an application that will use an Azure Data Lake Storage Gen 2 account to store petabytes of license plate photos from toll booths. The account will use zoneredundant storage (ZRS). You identify the following usage patterns: • The data will be accessed several times a day during the first 30 days after the data is created. The data must meet an availability SU of 99.9%. • After 90 days, the data will be accessed infrequently but must be available within 30 seconds. • After 365 days, the data will be accessed infrequently but must be available within five minutes.


Sample Question 25

You are designing database for an Azure Synapse Analytics dedicated SQL pool to support workloads for detecting ecommerce transaction fraud. Data will be combined from multiple ecommerce sites and can include sensitive financial information such as credit card numbers. You need to recommend a solution that meets the following requirements: Users must be able to identify potentially fraudulent transactions. Users must be able to use credit cards as a potential feature in models. Users must NOT be able to access the actual credit card numbers. What should you include in the recommendation? 

A. Transparent Data Encryption (TDE) 
B. row-level security (RLS) 
C. column-level encryption 
D. Azure Active Directory (Azure AD) pass-through authentication 


Sample Question 26

You have an Azure Synapse Analytics dedicated SQL pool. You need to Create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations: Show order counts by week. • Calculate sales totals by region. • Calculate sales totals by product. • Find all the orders from a given month. Which data should you use to partition Table1?

A. region 
B. product 
C. week
 D. month 


Sample Question 27

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB. You need to create the table to meet the following requirements: • Provide the fastest Query time. • Minimize data movement during queries. Which type of table should you use? 

A. hash distributed 
B. heap
 C. replicated 
D. round-robin


Sample Question 28

You are designing an Azure Databricks interactive cluster. The cluster will be used infrequently and will be configured for auto-termination. You need to ensure that the cluster configuration is retained indefinitely after the cluster is terminated. The solution must minimize costs. What should you do? 

A. Clone the cluster after it is terminated. 
B. Terminate the cluster manually when processing completes.
 C. Create an Azure runbook that starts the cluster every 90 days.
 D. Pin the cluster. 


Sample Question 29

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage! New files are uploaded daily to storage1. • Incrementally process new files as they are upkorage1 as a structured streaming source. The solution must meet the following requirements: • Minimize implementation and maintenance effort. • Minimize the cost of processing millions of files. • Support schema inference and schema drift. Which should you include in the recommendation?

A. Auto Loader 
B. Apache Spark FileStreamSource 
C. COPY INTO 
D. Azure Data Factory 


Sample Question 30

You have an activity in an Azure Data Factory pipeline. The activity calls a stored procedure in a data warehouse in Azure Synapse Analytics and runs daily. You need to verify the duration of the activity when it ran last. What should you use?

A. activity runs in Azure Monitor 
B. Activity log in Azure Synapse Analytics 
C. the sys.dm_pdw_wait_stats data management view in Azure Synapse Analytics 
D. an Azure Resource Manager template 


Sample Question 31

You are designing a highly available Azure Data Lake Storage solution that will induce geozone-redundant storage (GZRS). You need to monitor for replication delays that can affect the recovery point objective (RPO). What should you include m the monitoring solution? 

A. Last Sync Time 
B. Average Success Latency 
C. Error errors 
D. availability 


Sample Question 32

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1. You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer of Table1. You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1. Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that has an additional DateTime column. Does this meet the goal?

A. Yes 
B. No 


Sample Question 33

You have an Azure Stream Analytics job. You need to ensure that the job has enough streaming units provisioned. You configure monitoring of the SU % Utilization metric. Which two additional metrics should you monitor? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A. Backlogged Input Events 
B. Watermark Delay 
C. Function Events 
D. Out of order Events 
E. Late Input Events 


Sample Question 34

A company uses Azure Stream Analytics to monitor devices. The company plans to double the number of devices that are monitored. You need to monitor a Stream Analytics job to ensure that there are enough processing resources to handle the additional load. Which metric should you monitor?

A. Early Input Events 
B. Late Input Events 
C. Watermark delay 
D. Input Deserialization Errors 


Sample Question 35

You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a table named Customers. Customers will contain credit card information. You need to recommend a solution to provide salespeople with the ability to view all the entries in Customers. The solution must prevent all the salespeople from viewing or inferring the credit cardinformation. What should you include in the recommendation?

A. data masking 
B. Always Encrypted 
C. column-level security 
D. row-level security 


Sample Question 36

You have an Azure Databricks resource.You need to log actions that relate to changes in compute for the Databricks resource.Which Databricks services should you log?

A. clusters 
B. workspace 
C. DBFS 
D. SSH 
E lobs 


Sample Question 37

You have an Azure Data lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone,transform the data by executing an R script, and then insert the transformed data into adata warehouse in Azure Synapse Analytics.Solution You use an Azure Data Factory schedule trigger to execute a pipeline thatexecutes an Azure Databricks notebook, and then inserts the data into the data warehouseDow this meet the goal?

A. Yes 
B. No 


Sample Question 38

You plan to build a structured streaming solution in Azure Databricks. The solution willcount new events in five-minute intervals and report only events that arrive during theinterval. The output will be sent to a Delta Lake table.Which output mode should you use?

A. complete 
B. update 
C. append 


Sample Question 39

You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure DataLake Storage Gen2 container.Which resource provider should you enable?

A. Microsoft.Sql 
B. Microsoft-Automation 
C. Microsoft.EventGrid 
D. Microsoft.EventHub 


Sample Question 40

You are designing an Azure Databricks interactive cluster. The cluster will be usedinfrequently and will be configured for auto-termination.You need to ensure that the cluster configuration is retained indefinitely after the cluster isterminated. The solution must minimize costsWhat should you do?

A. Clone the cluster after it is terminated. 
B. Terminate the cluster manually when processing completes. 
C. Create an Azure runbook that starts the cluster every 90 days. 
D. Pin the cluster. 


Sample Question 41

You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on aserver named Server1.You need to verify whether the size of the transaction log file for each distribution of DW1 issmaller than 160 GB.What should you do?

A. On the master database, execute a query against thesys.dm_pdw_nodes_os_performance_counters dynamic management view. 
B. From Azure Monitor in the Azure portal, execute a query against the logs of DW1. 
C. On DW1, execute a query against the sys.database_files dynamic management view. 
D. Execute a query against the logs of DW1 by using the Get-AzOperationalInsightSearchResult PowerShell cmdlet. 


Sample Question 42

You are designing a financial transactions table in an Azure Synapse Analytics dedicatedSQL pool. The table will have a clustered columnstore index and will include the followingcolumns:TransactionType: 40 million rows per transaction typeCustomerSegment: 4 million per customer segmentTransactionMonth: 65 million rows per monthAccountType: 500 million per account typeYou have the following query requirements:Analysts will most commonly analyze transactions for a given month.Transactions analysis will typically summarize transactions by transaction type,customer segment, and/or account typeYou need to recommend a partition strategy for the table to minimize query times.On which column should you recommend partitioning the table?

A. CustomerSegment 
B. AccountType 
C. TransactionType 
D. TransactionMonth 


Sample Question 43

You plan to ingest streaming social media data by using Azure Stream Analytics. The datawill be stored in files in Azure Data Lake Storage, and then consumed by using AzureDatiabricks and PolyBase in Azure Synapse Analytics.You need to recommend a Stream Analytics data output format to ensure that the queriesfrom Databricks and PolyBase against the files encounter the fewest possible errors. Thesolution must ensure that the tiles can be queried quickly and that the data type informationis retained.What should you recommend?

A. Parquet 
B. Avro 
C. CSV 
D. JSON 


Sample Question 44

Note: This question is part of a series of questions that present the same scenario.Each question in the series contains a unique solution that might meet the statedgoals. Some question sets might have more than one correct solution, while othersmight not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As aresult, these questions will not appear in the review screen.You plan to create an Azure Databricks workspace that has a tiered structure. Theworkspace will contain the following three workloads:A workload for data engineers who will use Python and SQL.A workload for jobs that will run notebooks that use Python, Scala, and SOL.A workload that data scientists will use to perform ad hoc analysis in Scala and R.The enterprise architecture team at your company identifies the following standards forDatabricks environments: The data engineers must share a cluster.The job cluster will be managed by using a request process whereby datascientists and data engineers provide packaged notebooks for deployment to thecluster.All the data scientists must be assigned their own cluster that terminatesautomatically after 120 minutes of inactivity. Currently, there are three datascientists.You need to create the Databricks clusters for the workloads.Solution: You create a Standard cluster for each data scientist, a High Concurrency clusterfor the data engineers, and a Standard cluster for the jobs.Does this meet the goal?

A. Yes 
B. No 


Sample Question 45

You have an Azure Stream Analytics job.You need to ensure that the job has enough streaming units provisionedYou configure monitoring of the SU % Utilization metric.Which two additional metrics should you monitor? Each correct answer presents part of thesolution.NOTE Each correct selection is worth one point

A. Out of order Events 
B. Late Input Events 
C. Baddogged Input Events 
D. Function Events 


Sample Question 46

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, castcolumns to specified types of data, and insert the data into a table in an Azure SynapseAnalytic dedicated SQL pool. The CSV file contains three columns named username,comment, and date.The data flow already contains the following:A source transformation.A Derived Column transformation to set the appropriate types of data.A sink transformation to land the data in the pool.You need to ensure that the data flow meets the following requirements:All valid rows must be written to the destination table.Truncation errors in the comment column must be avoided proactively.Any rows containing comment values that will cause truncation errors upon insertmust be written to a file in blob storage.Which two actions should you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.

A. To the data flow, add a sink transformation to write the rows to a file in blob storage. 
B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors. 
C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors
D. Add a select transformation to select only the rows that will cause truncation errors. 


Sample Question 47

You are developing a solution that will stream to Azure Stream Analytics. The solution willhave both streaming data and reference data.Which input type should you use for the reference data?

A. Azure Cosmos DB 
B. Azure Blob storage 
C. Azure IoT Hub 
D. Azure Event Hubs 


Sample Question 48

You have an Azure Synapse Analytics dedicated SQL pool that contains a table namedTable1.You have files that are ingested and loaded into an Azure Data Lake Storage Gen2container named container1.You plan to insert data from the files into Table1 and azure Data Lake Storage Gen2container named container1.You plan to insert data from the files into Table1 and transform the data. Each row of datain the files will produce one row in the serving layer of Table1.You need to ensure that when the source data files are loaded to container1, the DateTimeis stored as an additional column in Table1.Solution: You use a dedicated SQL pool to create an external table that has a additionalDateTime column.Does this meet the goal?

A. Yes 
B. No 


Sample Question 49

You plan to perform batch processing in Azure Databricks once daily.Which type of Databricks cluster should you use?

A. High Concurrency 
B. automated 
C. interactive 


Sample Question 50

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a databasenamed DB1. DB1 contains a fact table named Table1.You need to identify the extent of the data skew in Table1.What should you do in Synapse Studio?

A. Connect to the built-in pool and query sysdm_pdw_sys_info. 
B. Connect to Pool1 and run DBCC CHECKALLOC. 
C. Connect to the built-in pool and run DBCC CHECKALLOC. 
D. Connect to Pool! and query sys.dm_pdw_nodes_db_partition_stats. 


Sample Question 51

You are creating a new notebook in Azure Databricks that will support R as the primarylanguage but will also support Scale and SOL Which switch should you use to switchbetween languages?

A. @<Language> 
B. %<Language> 
C. \\(<Language>) 
D. \\(<Language>) 


Sample Question 52

You use Azure Data Lake Storage Gen2.You need to ensure that workloads can use filter predicates and column projections to filterdata at the time the data is read from disk.Which two actions should you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.

A. Reregister the Microsoft Data Lake Store resource provider. 
B. Reregister the Azure Storage resource provider. 
C. Create a storage policy that is scoped to a container. 
D. Register the query acceleration feature. 
E. Create a storage policy that is scoped to a container prefix filter. 


Sample Question 53

Note: This question is part of a series of questions that present the same scenario. Eachquestion in the series contains a unique solution that might meet the stated goals. Somequestion sets might have more than one correct solution, while others might not have acorrect solution.After you answer a question in this scenario, you will NOT be able to return to it. As aresult, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain textand numerical values. 75% of the rows contain description data that has an average lengthof 1.1 MB.You plan to copy the data from the storage account to an enterprise data warehouse inAzure Synapse Analytics.You need to prepare the files to ensure that the data copies quickly.Solution: You convert the files to compressed delimited text files.Does this meet the goal?

A. Yes 
B. No 



Exam Code: DP-203
Exam Name: Data Engineering on Microsoft Azure
Last Update: May 13, 2024
Questions: 331