We give our customers with the finest MLS-C01 preparation material available in the form of pdf .Amazon MLS-C01 exam questions answers are carefully analyzed and crafted with the latest exam patterns by our experts. This steadfast commitment to excellence has built unbreakable trust among countless people who aspire to advance their careers. Our learning resources are designed to help our students attain an impressive score of over 97% in the Amazon MLS-C01 exam, thanks to our effective study materials. We appreciate your time and investments, ensuring you receive the best resources. Rest assured, we leave no room for error, committed to excellence.
Friendly Support Available 24/7:
If you face issues with our Amazon MLS-C01 Exam dumps, our customer support specialists are ready to assist you promptly. Your success is our priority, we believe in quality and our customers are our 1st priority. Our team is available 24/7 to offer guidance and support for your Amazon MLS-C01 exam preparation. Feel free to reach out with any questions if you find any difficulty or confusion. We are committed to ensuring you have the necessary study materials to excel.
Verified and approved Dumps for Amazon MLS-C01:
Our team of IT experts delivers the most accurate and reliable MLS-C01 dumps for your Amazon MLS-C01 exam. All the study material is approved and verified by our team regarding Amazon MLS-C01 dumps. Our meticulously verified material, endorsed by our IT experts, ensures that you excel with distinction in the MLS-C01 exam. This top-tier resource, consisting of MLS-C01 exam questions answers, mirrors the actual exam format, facilitating effective preparation. Our committed team works tirelessly to make sure that our customers can confidently pass their exams on their first attempt, backed by the assurance that our MLS-C01 dumps are the best and have been thoroughly approved by our experts.
Amazon MLS-C01 Questions:
Embark on your certification journey with confidence as we are providing most reliable MLS-C01 dumps from Microsoft. Our commitment to your success comes with a 100% passing guarantee, ensuring that you successfully navigate your Amazon MLS-C01 exam on your initial attempt. Our dedicated team of seasoned experts has intricately designed our Amazon MLS-C01 dumps PDF to align seamlessly with the actual exam question answers. Trust our comprehensive MLS-C01 exam questions answers to be your reliable companion for acing the MLS-C01 certification.
Amazon MLS-C01 Sample Questions
Question # 1
A data scientist stores financial datasets in Amazon S3. The data scientist uses AmazonAthena to query the datasets by using SQL.The data scientist uses Amazon SageMaker to deploy a machine learning (ML) model. Thedata scientist wants to obtain inferences from the model at the SageMaker endpointHowever, when the data …. ntist attempts to invoke the SageMaker endpoint, the datascientist receives SOL statement failures The data scientist's 1AM user is currently unableto invoke the SageMaker endpointWhich combination of actions will give the data scientist's 1AM user the ability to invoke the SageMaker endpoint? (Select THREE.)
A. Attach the AmazonAthenaFullAccess AWS managed policy to the user identity. B. Include a policy statement for the data scientist's 1AM user that allows the 1AM user toperform the sagemaker: lnvokeEndpoint action, C. Include an inline policy for the data scientist’s 1AM user that allows SageMaker to readS3 objects D. Include a policy statement for the data scientist's 1AM user that allows the 1AM user toperform the sagemakerGetRecord action. E. Include the SQL statement "USING EXTERNAL FUNCTION ml_function_name" in theAthena SQL query. F. Perform a user remapping in SageMaker to map the 1AM user to another 1AM user thatis on the hosted endpoint.
Answer: B,C,E
Explanation: The correct combination of actions to enable the data scientist’s IAM user to
invoke the SageMaker endpoint is B, C, and E, because they ensure that the IAM user has
the necessary permissions, access, and syntax to query the ML model from Athena. These
actions have the following benefits:
B: Including a policy statement for the IAM user that allows the
sagemaker:InvokeEndpoint action grants the IAM user the permission to call the
SageMaker Runtime InvokeEndpoint API, which is used to get inferences from the
model hosted at the endpoint1.
C: Including an inline policy for the IAM user that allows SageMaker to read S3
objects enables the IAM user to access the data stored in S3, which is the source
of the Athena queries2.
E: Including the SQL statement “USING EXTERNAL FUNCTION
ml_function_name” in the Athena SQL query allows the IAM user to invoke the ML
model as an external function from Athena, which is a feature that enables
querying ML models from SQL statements3.
The other options are not correct or necessary, because they have the following
drawbacks:
A: Attaching the AmazonAthenaFullAccess AWS managed policy to the user
identity is not sufficient, because it does not grant the IAM user the permission to
invoke the SageMaker endpoint, which is required to query the ML model4.
D: Including a policy statement for the IAM user that allows the IAM user to
perform the sagemaker:GetRecord action is not relevant, because this action is
used to retrieve a single record from a feature group, which is not the case in this
scenario5.
F: Performing a user remapping in SageMaker to map the IAM user to another
IAM user that is on the hosted endpoint is not applicable, because this feature is
only available for multi-model endpoints, which are not used in this scenario.
References:
1: InvokeEndpoint - Amazon SageMaker
2: Querying Data in Amazon S3 from Amazon Athena - Amazon Athena
3: Querying machine learning models from Amazon Athena using Amazon
SageMaker | AWS Machine Learning Blog 4: AmazonAthenaFullAccess - AWS Identity and Access Management
5: GetRecord - Amazon SageMaker Feature Store Runtime
: [Invoke a Multi-Model Endpoint - Amazon SageMaker]
Question # 2
A Machine Learning Specialist is designing a scalable data storage solution for AmazonSageMaker. There is an existing TensorFlow-based model implemented as a train.py scriptthat relies on static training data that is currently stored as TFRecords.Which method of providing training data to Amazon SageMaker would meet the businessrequirements with the LEAST development overhead?
A. Use Amazon SageMaker script mode and use train.py unchanged. Point the AmazonSageMaker training invocation to the local path of the data without reformatting the trainingdata. B. Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecorddata into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3bucket without reformatting the training data. C. Rewrite the train.py script to add a section that converts TFRecords to protobuf andingests the protobuf data instead of TFRecords. D. Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue orAWS Lambda to reformat and store the data in an Amazon S3 bucket.
Answer: B
Explanation: Amazon SageMaker script mode is a feature that allows users to use training
scripts similar to those they would use outside SageMaker with SageMaker’s prebuilt
containers for various frameworks such as TensorFlow. Script mode supports reading data
from Amazon S3 buckets without requiring any changes to the training script. Therefore,
option B is the best method of providing training data to Amazon SageMaker that would
meet the business requirements with the least development overhead.
Option A is incorrect because using a local path of the data would not be scalable or
reliable, as it would depend on the availability and capacity of the local storage. Moreover,
using a local path of the data would not leverage the benefits of Amazon S3, such as
durability, security, and performance. Option C is incorrect because rewriting the train.py
script to convert TFRecords to protobuf would require additional development effort and
complexity, as well as introduce potential errors and inconsistencies in the data format.
Option D is incorrect because preparing the data in the format accepted by Amazon
SageMaker would also require additional development effort and complexity, as well as
involve using additional services such as AWS Glue or AWS Lambda, which would
increase the cost and maintenance of the solution.
References:
Bring your own model with Amazon SageMaker script mode
GitHub - aws-samples/amazon-sagemaker-script-mode
Deep Dive on TensorFlow training with Amazon SageMaker and Amazon S3
amazon-sagemaker-script-mode/generate_cifar10_tfrecords.py at master
Question # 3
A credit card company wants to identify fraudulent transactions in real time. A data scientistbuilds a machine learning model for this purpose. The transactional data is captured andstored in Amazon S3. The historic data is already labeled with two classes: fraud (positive)and fair transactions (negative). The data scientist removes all the missing data and buildsa classifier by using the XGBoost algorithm in Amazon SageMaker. The model producesthe following results:• True positive rate (TPR): 0.700• False negative rate (FNR): 0.300• True negative rate (TNR): 0.977• False positive rate (FPR): 0.023• Overall accuracy: 0.949Which solution should the data scientist use to improve the performance of the model?
A. Apply the Synthetic Minority Oversampling Technique (SMOTE) on the minority class inthe training dataset. Retrain the model with the updated training data. B. Apply the Synthetic Minority Oversampling Technique (SMOTE) on the majority class in the training dataset. Retrain the model with the updated training data. C. Undersample the minority class. D. Oversample the majority class.
Answer: A
Explanation: The solution that the data scientist should use to improve the performance of
the model is to apply the Synthetic Minority Oversampling Technique (SMOTE) on the
minority class in the training dataset, and retrain the model with the updated training data.
This solution can address the problem of class imbalance in the dataset, which can affect
the model’s ability to learn from the rare but important positive class (fraud).
Class imbalance is a common issue in machine learning, especially for classification tasks.
It occurs when one class (usually the positive or target class) is significantly
underrepresented in the dataset compared to the other class (usually the negative or nontarget
class). For example, in the credit card fraud detection problem, the positive class
(fraud) is much less frequent than the negative class (fair transactions). This can cause the
model to be biased towards the majority class, and fail to capture the characteristics and
patterns of the minority class. As a result, the model may have a high overall accuracy, but
a low recall or true positive rate for the minority class, which means it misses many
fraudulent transactions.
SMOTE is a technique that can help mitigate the class imbalance problem by generating
synthetic samples for the minority class. SMOTE works by finding the k-nearest neighbors
of each minority class instance, and randomly creating new instances along the line
segments connecting them. This way, SMOTE can increase the number and diversity of
the minority class instances, without duplicating or losing any information. By applying
SMOTE on the minority class in the training dataset, the data scientist can balance the
classes and improve the model’s performance on the positive class1.
The other options are either ineffective or counterproductive. Applying SMOTE on the
majority class would not balance the classes, but increase the imbalance and the size of
the dataset. Undersampling the minority class would reduce the number of instances
available for the model to learn from, and potentially lose some important information.
Oversampling the majority class would also increase the imbalance and the size of the
dataset, and introduce redundancy and overfitting.
References:
1: SMOTE for Imbalanced Classification with Python - Machine Learning Mastery
Question # 4
A pharmaceutical company performs periodic audits of clinical trial sites to quickly resolvecritical findings. The company stores audit documents in text format. Auditors haverequested help from a data science team to quickly analyze the documents. The auditorsneed to discover the 10 main topics within the documents to prioritize and distribute thereview work among the auditing team members. Documents that describe adverse eventsmust receive the highest priority. A data scientist will use statistical modeling to discover abstract topics and to provide a listof the top words for each category to help the auditors assess the relevance of the topic.Which algorithms are best suited to this scenario? (Choose two.)
A. Latent Dirichlet allocation (LDA) B. Random Forest classifier C. Neural topic modeling (NTM) D. Linear support vector machine E. Linear regression
Answer: A,C
Explanation: The algorithms that are best suited to this scenario are latent Dirichlet
allocation (LDA) and neural topic modeling (NTM), as they are both unsupervised learning
methods that can discover abstract topics from a collection of text documents. LDA and
NTM can provide a list of the top words for each topic, as well as the topic distribution for
each document, which can help the auditors assess the relevance and priority of the
topic12.
The other options are not suitable because:
Option B: A random forest classifier is a supervised learning method that can
perform classification or regression tasks by using an ensemble of decision
trees. A random forest classifier is not suitable for discovering abstract topics from
text documents, as it requires labeled data and predefined classes3.
Option D: A linear support vector machine is a supervised learning method that
can perform classification or regression tasks by using a linear function that
separates the data into different classes. A linear support vector machine is not
suitable for discovering abstract topics from text documents, as it requires labeled
data and predefined classes4.
Option E: A linear regression is a supervised learning method that can perform
regression tasks by using a linear function that models the relationship between a
dependent variable and one or more independent variables. A linear regression is
not suitable for discovering abstract topics from text documents, as it requires
labeled data and a continuous output variable5.
References:
1: Latent Dirichlet Allocation
2: Neural Topic Modeling
3: Random Forest Classifier
4: Linear Support Vector Machine
5: Linear Regression
Question # 5
A media company wants to create a solution that identifies celebrities in pictures that usersupload. The company also wants to identify the IP address and the timestamp details fromthe users so the company can prevent users from uploading pictures from unauthorizedlocations.Which solution will meet these requirements with LEAST development effort?
A. Use AWS Panorama to identify celebrities in the pictures. Use AWS CloudTrail tocapture IP address and timestamp details. B. Use AWS Panorama to identify celebrities in the pictures. Make calls to the AWSPanorama Device SDK to capture IP address and timestamp details. C. Use Amazon Rekognition to identify celebrities in the pictures. Use AWS CloudTrail tocapture IP address and timestamp details. D. Use Amazon Rekognition to identify celebrities in the pictures. Use the text detectionfeature to capture IP address and timestamp details.
Answer: C
Explanation: The solution C will meet the requirements with the least development effort
because it uses Amazon Rekognition and AWS CloudTrail, which are fully managed
services that can provide the desired functionality. The solution C involves the following
steps:
Use Amazon Rekognition to identify celebrities in the pictures. Amazon
Rekognition is a service that can analyze images and videos and extract insights
such as faces, objects, scenes, emotions, and more. Amazon Rekognition also
provides a feature called Celebrity Recognition, which can recognize thousands of
celebrities across a number of categories, such as politics, sports, entertainment,
and media. Amazon Rekognition can return the name, face, and confidence score
of the recognized celebrities, as well as additional information such as URLs and
biographies1.
Use AWS CloudTrail to capture IP address and timestamp details. AWS CloudTrail
is a service that can record the API calls and events made by or on behalf of AWS
accounts. AWS CloudTrail can provide information such as the source IP address,
the user identity, the request parameters, and the response elements of the API
calls. AWS CloudTrail can also deliver the event records to an Amazon S3 bucket
or an Amazon CloudWatch Logs group for further analysis and auditing2.
The other options are not suitable because:
Option A: Using AWS Panorama to identify celebrities in the pictures and using
AWS CloudTrail to capture IP address and timestamp details will not meet the
requirements effectively. AWS Panorama is a service that can extend computer
vision to the edge, where it can run inference on video streams from cameras and
other devices. AWS Panorama is not designed for identifying celebrities in
pictures, and it may not provide accurate or relevant results. Moreover, AWS
Panorama requires the use of an AWS Panorama Appliance or a compatible
device, which may incur additional costs and complexity3.
Option B: Using AWS Panorama to identify celebrities in the pictures and making
calls to the AWS Panorama Device SDK to capture IP address and timestamp
details will not meet the requirements effectively, for the same reasons as option
A. Additionally, making calls to the AWS Panorama Device SDK will require more
development effort than using AWS CloudTrail, as it will involve writing custom
code and handling errors and exceptions4.
Option D: Using Amazon Rekognition to identify celebrities in the pictures and
using the text detection feature to capture IP address and timestamp details will
not meet the requirements effectively. The text detection feature of Amazon
Rekognition is used to detect and recognize text in images and videos, such as
street names, captions, product names, and license plates. It is not suitable for
capturing IP address and timestamp details, as these are not part of the pictures
that users upload. Moreover, the text detection feature may not be accurate or
reliable, as it depends on the quality and clarity of the text in the images and
A retail company stores 100 GB of daily transactional data in Amazon S3 at periodicintervals. The company wants to identify the schema of the transactional data. Thecompany also wants to perform transformations on the transactional data that is in AmazonS3.The company wants to use a machine learning (ML) approach to detect fraud in thetransformed data.Which combination of solutions will meet these requirements with the LEAST operationaloverhead? {Select THREE.)
A. Use Amazon Athena to scan the data and identify the schema. B. Use AWS Glue crawlers to scan the data and identify the schema. C. Use Amazon Redshift to store procedures to perform data transformations D. Use AWS Glue workflows and AWS Glue jobs to perform data transformations. E. Use Amazon Redshift ML to train a model to detect fraud. F. Use Amazon Fraud Detector to train a model to detect fraud.
Answer: B,D,F
Explanation: To meet the requirements with the least operational overhead, the company
should use AWS Glue crawlers, AWS Glue workflows and jobs, and Amazon Fraud
Detector. AWS Glue crawlers can scan the data in Amazon S3 and identify the schema,
which is then stored in the AWS Glue Data Catalog. AWS Glue workflows and jobs can
perform data transformations on the data in Amazon S3 using serverless Spark or Python
scripts. Amazon Fraud Detector can train a model to detect fraud using the transformed
data and the company’s historical fraud labels, and then generate fraud predictions using a
simple API call.
Option A is incorrect because Amazon Athena is a serverless query service that can
analyze data in Amazon S3 using standard SQL, but it does not perform data
transformations or fraud detection.
Option C is incorrect because Amazon Redshift is a cloud data warehouse that can store
and query data using SQL, but it requires provisioning and managing clusters, which adds
operational overhead. Moreover, Amazon Redshift does not provide a built-in fraud detection capability.
Option E is incorrect because Amazon Redshift ML is a feature that allows users to create,
train, and deploy machine learning models using SQL commands in Amazon Redshift.
However, using Amazon Redshift ML would require loading the data from Amazon S3 to
Amazon Redshift, which adds complexity and cost. Also, Amazon Redshift ML does not
support fraud detection as a use case.
References:
AWS Glue Crawlers
AWS Glue Workflows and Jobs
Amazon Fraud Detector
Question # 7
An automotive company uses computer vision in its autonomous cars. The companytrained its object detection models successfully by using transfer learning from aconvolutional neural network (CNN). The company trained the models by using PyTorch through the Amazon SageMaker SDK.The vehicles have limited hardware and compute power. The company wants to optimizethe model to reduce memory, battery, and hardware consumption without a significantsacrifice in accuracy.Which solution will improve the computational efficiency of the models?
A. Use Amazon CloudWatch metrics to gain visibility into the SageMaker training weights,gradients, biases, and activation outputs. Compute the filter ranks based on the traininginformation. Apply pruning to remove the low-ranking filters. Set new weights based on thepruned set of filters. Run a new training job with the pruned model. B. Use Amazon SageMaker Ground Truth to build and run data labeling workflows. Collecta larger labeled dataset with the labelling workflows. Run a new training job that uses thenew labeled data with previous training data. C. Use Amazon SageMaker Debugger to gain visibility into the training weights, gradients,biases, and activation outputs. Compute the filter ranks based on the training information.Apply pruning to remove the low-ranking filters. Set the new weights based on the prunedset of filters. Run a new training job with the pruned model. D. Use Amazon SageMaker Model Monitor to gain visibility into the ModelLatency metricand OverheadLatency metric of the model after the company deploys the model. Increasethe model learning rate. Run a new training job.
Answer: C
Explanation: The solution C will improve the computational efficiency of the models
because it uses Amazon SageMaker Debugger and pruning, which are techniques that can
reduce the size and complexity of the convolutional neural network (CNN) models. The
solution C involves the following steps:
Use Amazon SageMaker Debugger to gain visibility into the training weights,
gradients, biases, and activation outputs. Amazon SageMaker Debugger is a
service that can capture and analyze the tensors that are emitted during the
training process of machine learning models. Amazon SageMaker Debugger can
provide insights into the model performance, quality, and convergence. Amazon
SageMaker Debugger can also help to identify and diagnose issues such as
overfitting, underfitting, vanishing gradients, and exploding gradients1.
Compute the filter ranks based on the training information. Filter ranking is a
technique that can measure the importance of each filter in a convolutional layer
based on some criterion, such as the average percentage of zero activations or
the L1-norm of the filter weights. Filter ranking can help to identify the filters that
have little or no contribution to the model output, and thus can be removed without
affecting the model accuracy2.
Apply pruning to remove the low-ranking filters. Pruning is a technique that can
reduce the size and complexity of a neural network by removing the redundant or
irrelevant parts of the network, such as neurons, connections, or filters. Pruning
can help to improve the computational efficiency, memory usage, and inference speed of the model, as well as to prevent overfitting and improve generalization3.
Set the new weights based on the pruned set of filters. After pruning, the model
will have a smaller and simpler architecture, with fewer filters in each convolutional
layer. The new weights of the model can be set based on the pruned set of filters,
either by initializing them randomly or by fine-tuning them from the original
weights4.
Run a new training job with the pruned model. The pruned model can be trained
again with the same or a different dataset, using the same or a different framework
or algorithm. The new training job can use the same or a different configuration of
Amazon SageMaker, such as the instance type, the hyperparameters, or the data
ingestion mode. The new training job can also use Amazon SageMaker Debugger
to monitor and analyze the training process and the model quality5.
The other options are not suitable because:
Option A: Using Amazon CloudWatch metrics to gain visibility into the SageMaker
training weights, gradients, biases, and activation outputs will not be as effective
as using Amazon SageMaker Debugger. Amazon CloudWatch is a service that
can monitor and observe the operational health and performance of AWS
resources and applications. Amazon CloudWatch can provide metrics, alarms,
dashboards, and logs for various AWS services, including Amazon SageMaker.
However, Amazon CloudWatch does not provide the same level of granularity and
detail as Amazon SageMaker Debugger for the tensors that are emitted during the
training process of machine learning models. Amazon CloudWatch metrics are
mainly focused on the resource utilization and the training progress, not on the
model performance, quality, and convergence6.
Option B: Using Amazon SageMaker Ground Truth to build and run data labeling
workflows and collecting a larger labeled dataset with the labeling workflows will
not improve the computational efficiency of the models. Amazon SageMaker
Ground Truth is a service that can create high-quality training datasets for machine
learning by using human labelers. A larger labeled dataset can help to improve the
model accuracy and generalization, but it will not reduce the memory, battery, and
hardware consumption of the model. Moreover, a larger labeled dataset may
increase the training time and cost of the model7.
Option D: Using Amazon SageMaker Model Monitor to gain visibility into the
ModelLatency metric and OverheadLatency metric of the model after the company
deploys the model and increasing the model learning rate will not improve the
computational efficiency of the models. Amazon SageMaker Model Monitor is a
service that can monitor and analyze the quality and performance of machine
learning models that are deployed on Amazon SageMaker endpoints. The
ModelLatency metric and the OverheadLatency metric can measure the inference
latency of the model and the endpoint, respectively. However, these metrics do not
provide any information about the training weights, gradients, biases, and
activation outputs of the model, which are needed for pruning. Moreover,
increasing the model learning rate will not reduce the size and complexity of the
model, but it may affect the model convergence and accuracy.
References:
1: Amazon SageMaker Debugger
2: Pruning Convolutional Neural Networks for Resource Efficient Inference
3: Pruning Neural Networks: A Survey
4: Learning both Weights and Connections for Efficient Neural Networks 5: Amazon SageMaker Training Jobs
6: Amazon CloudWatch Metrics for Amazon SageMaker
7: Amazon SageMaker Ground Truth
: Amazon SageMaker Model Monitor
Question # 8
A media company is building a computer vision model to analyze images that are on socialmedia. The model consists of CNNs that the company trained by using images that thecompany stores in Amazon S3. The company used an Amazon SageMaker training job inFile mode with a single Amazon EC2 On-Demand Instance.Every day, the company updates the model by using about 10,000 images that thecompany has collected in the last 24 hours. The company configures training with only oneepoch. The company wants to speed up training and lower costs without the need to makeany code changes.Which solution will meet these requirements?
A. Instead of File mode, configure the SageMaker training job to use Pipe mode. Ingest thedata from a pipe. B. Instead Of File mode, configure the SageMaker training job to use FastFile mode withno Other changes. C. Instead Of On-Demand Instances, configure the SageMaker training job to use SpotInstances. Make no Other changes. D. Instead Of On-Demand Instances, configure the SageMaker training job to use SpotInstances. Implement model checkpoints.
Answer: C
Explanation: The solution C will meet the requirements because it uses Amazon
SageMaker Spot Instances, which are unused EC2 instances that are available at up to
90% discount compared to On-Demand prices. Amazon SageMaker Spot Instances can
speed up training and lower costs by taking advantage of the spare EC2 capacity. The
company does not need to make any code changes to use Spot Instances, as it can simply
enable the managed spot training option in the SageMaker training job configuration. The
company also does not need to implement model checkpoints, as it is using only one
epoch for training, which means the model will not resume from a previous state1.
The other options are not suitable because:
Option A: Configuring the SageMaker training job to use Pipe mode instead of File
mode will not speed up training or lower costs significantly. Pipe mode is a data
ingestion mode that streams data directly from S3 to the training algorithm, without
copying the data to the local storage of the training instance. Pipe mode can
reduce the startup time of the training job and the disk space usage, but it does not
affect the computation time or the instance price. Moreover, Pipe mode may
require some code changes to handle the streaming data, depending on the
training algorithm2. Option B: Configuring the SageMaker training job to use FastFile mode instead of
File mode will not speed up training or lower costs significantly. FastFile mode is a
data ingestion mode that copies data from S3 to the local storage of the training
instance in parallel with the training process. FastFile mode can reduce the startup
time of the training job and the disk space usage, but it does not affect the
computation time or the instance price. Moreover, FastFile mode is only available
for distributed training jobs that use multiple instances, which is not the case for
the company3.
Option D: Configuring the SageMaker training job to use Spot Instances and
implementing model checkpoints will not meet the requirements without the need
to make any code changes. Model checkpoints are a feature that allows the
training job to save the model state periodically to S3, and resume from the latest
checkpoint if the training job is interrupted. Model checkpoints can help to avoid
losing the training progress and ensure the model convergence, but they require
some code changes to implement the checkpointing logic and the resuming logic4.
References:
1: Managed Spot Training - Amazon SageMaker
2: Pipe Mode - Amazon SageMaker
3: FastFile Mode - Amazon SageMaker
4: Checkpoints - Amazon SageMaker
Question # 9
A data scientist is building a forecasting model for a retail company by using the mostrecent 5 years of sales records that are stored in a data warehouse. The dataset containssales records for each of the company's stores across five commercial regions The datascientist creates a working dataset with StorelD. Region. Date, and Sales Amount ascolumns. The data scientist wants to analyze yearly average sales for each region. Thescientist also wants to compare how each region performed compared to average salesacross all commercial regions.Which visualization will help the data scientist better understand the data trend?
A. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each store. Create a bar plot, faceted by year, of average sales foreach store. Add an extra bar in each facet to represent average sales. B. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each store. Create a bar plot, colored by region and faceted by year,of average sales for each store. Add a horizontal line in each facet to represent averagesales. C. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each region Create a bar plot of average sales for each region. Addan extra bar in each facet to represent average sales. D. Create an aggregated dataset by using the Pandas GroupBy function to get average sales for each year for each region Create a bar plot, faceted by year, of average sales foreach region Add a horizontal line in each facet to represent average sales.
Answer: D
Explanation: The best visualization for this task is to create a bar plot, faceted by year, of
average sales for each region and add a horizontal line in each facet to represent average
sales. This way, the data scientist can easily compare the yearly average sales for each
region with the overall average sales and see the trends over time. The bar plot also allows
the data scientist to see the relative performance of each region within each year and
across years. The other options are less effective because they either do not show the
yearly trends, do not show the overall average sales, or do not group the data by region.
A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10hours on average to train the model on GPU instances. The data scientist suspects thattraining is not converging and thatresource utilization is not optimal.What should the data scientist do to identify and address training issues with the LEASTdevelopment effort?
A. Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure aCloudWatch alarm to stop the training job early if low CPU utilization occurs. B. Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configurean AWS Lambda function to analyze the metrics and to stop the training job early if issuesare detected. C. Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rulesto detect issues and to launch the StopTrainingJob action if issues are detected. D. Use the SageMaker Debugger confusion and feature_importance_overweight built-inrules to detect issues and to launch the StopTrainingJob action if issues are detected.
Answer: C
Explanation: The solution C is the best option to identify and address training issues with
the least development effort. The solution C involves the following steps:
Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in
rules to detect issues. SageMaker Debugger is a feature of Amazon SageMaker
that allows data scientists to monitor, analyze, and debug machine learning
models during training. SageMaker Debugger provides a set of built-in rules that
can automatically detect common issues and anomalies in model training, such as
vanishing or exploding gradients, overfitting, underfitting, low GPU utilization, and
more1. The data scientist can use the vanishing_gradient rule to check if the
gradients are becoming too small and causing the training to not converge. The
data scientist can also use the LowGPUUtilization rule to check if the GPU
resources are underutilized and causing the training to be inefficient2.
Launch the StopTrainingJob action if issues are detected. SageMaker Debugger
can also take actions based on the status of the rules. One of the actions is
StopTrainingJob, which can terminate the training job if a rule is in an error
state. This can help the data scientist to save time and money by stopping the
training early if issues are detected3.
The other options are not suitable because:
Option A: Using CPU utilization metrics that are captured in Amazon CloudWatch
and configuring a CloudWatch alarm to stop the training job early if low CPU
utilization occurs will not identify and address training issues effectively. CPU
utilization is not a good indicator of model training performance, especially for GPU
instances. Moreover, CloudWatch alarms can only trigger actions based on simple
thresholds, not complex rules or conditions4.
Option B: Using high-resolution custom metrics that are captured in Amazon
CloudWatch and configuring an AWS Lambda function to analyze the metrics and
to stop the training job early if issues are detected will incur more development effort than using SageMaker Debugger. The data scientist will have to write the
code for capturing, sending, and analyzing the custom metrics, as well as for
invoking the Lambda function and stopping the training job. Moreover, this solution
may not be able to detect all the issues that SageMaker Debugger can5.
Option D: Using the SageMaker Debugger confusion and
feature_importance_overweight built-in rules and launching the StopTrainingJob
action if issues are detected will not identify and address training issues effectively.
The confusion rule is used to monitor the confusion matrix of a classification
model, which is not relevant for a regression model that predicts prices. The
feature_importance_overweight rule is used to check if some features have too
much weight in the model, which may not be related to the convergence or
resource utilization issues2.
References:
1: Amazon SageMaker Debugger
2: Built-in Rules for Amazon SageMaker Debugger
3: Actions for Amazon SageMaker Debugger
4: Amazon CloudWatch Alarms
5: Amazon CloudWatch Custom Metrics
Question # 11
A company builds computer-vision models that use deep learning for the autonomousvehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance thathas a CPU: GPU ratio of 12:1 to train the models.The ML specialist examines the instance metric logs and notices that the GPU is idle half ofthe time The ML specialist must reduce training costs without increasing the duration of thetraining jobs.Which solution will meet these requirements?
A. Switch to an instance type that has only CPUs. B. Use a heterogeneous cluster that has two different instances groups. C. Use memory-optimized EC2 Spot Instances for the training jobs. D. Switch to an instance type that has a CPU GPU ratio of 6:1.
Answer: D
Explanation: Switching to an instance type that has a CPU: GPU ratio of 6:1 will reduce
the training costs by using fewer CPUs and GPUs, while maintaining the same level of
performance. The GPU idle time indicates that the CPU is not able to feed the GPU with
enough data, so reducing the CPU: GPU ratio will balance the workload and improve the
GPU utilization. A lower CPU: GPU ratio also means less overhead for inter-process
communication and synchronization between the CPU and GPU processes. References:
Optimizing GPU utilization for AI/ML workloads on Amazon EC2
Analyze CPU vs. GPU Performance for AWS Machine Learning
Question # 12
An engraving company wants to automate its quality control process for plaques. Thecompany performs the process before mailing each customized plaque to a customer. Thecompany has created an Amazon S3 bucket that contains images of defects that shouldcause a plaque to be rejected. Low-confidence predictions must be sent to an internal teamof reviewers who are using Amazon Augmented Al (Amazon A2I).Which solution will meet these requirements?
A. Use Amazon Textract for automatic processing. Use Amazon A2I with AmazonMechanical Turk for manual review. B. Use Amazon Rekognition for automatic processing. Use Amazon A2I with a privateworkforce option for manual review. C. Use Amazon Transcribe for automatic processing. Use Amazon A2I with a privateworkforce option for manual review. D. Use AWS Panorama for automatic processing Use Amazon A2I with AmazonMechanical Turk for manual review
Answer: B
Explanation: Amazon Rekognition is a service that provides computer vision capabilities
for image and video analysis, such as object, scene, and activity detection, face and text
recognition, and custom label detection. Amazon Rekognition can be used to automate the
quality control process for plaques by comparing the images of the plaques with the images
of defects in the Amazon S3 bucket and returning a confidence score for each defect.
Amazon A2I is a service that enables human review of machine learning predictions, such
as low-confidence predictions from Amazon Rekognition. Amazon A2I can be integrated
with a private workforce option, which allows the engraving company to use its own internal
team of reviewers to manually inspect the plaques that are flagged by Amazon
Rekognition. This solution meets the requirements of automating the quality control
process, sending low-confidence predictions to an internal team of reviewers, and using Amazon A2I for manual review. References:
1: Amazon Rekognition documentation
2: Amazon A2I documentation
3: Amazon Rekognition Custom Labels documentation
4: Amazon A2I Private Workforce documentation
Question # 13
An Amazon SageMaker notebook instance is launched into Amazon VPC The SageMakernotebook references data contained in an Amazon S3 bucket in another account Thebucket is encrypted using SSE-KMS The instance returns an access denied error whentrying to access data in Amazon S3.Which of the following are required to access the bucket and avoid the access deniederror? (Select THREE)
A. An AWS KMS key policy that allows access to the customer master key (CMK) B. A SageMaker notebook security group that allows access to Amazon S3 C. An 1AM role that allows access to the specific S3 bucket D. A permissive S3 bucket policy E. An S3 bucket owner that matches the notebook owner F. A SegaMaker notebook subnet ACL that allow traffic to Amazon S3.
Answer: A,B,C
Explanation: To access an Amazon S3 bucket in another account that is encrypted using
SSE-KMS, the following are required:
A. An AWS KMS key policy that allows access to the customer master key (CMK).
The CMK is the encryption key that is used to encrypt and decrypt the data in the
S3 bucket. The KMS key policy defines who can use and manage the CMK. To
allow access to the CMK from another account, the key policy must include a
statement that grants the necessary permissions (such as kms:Decrypt) to the
principal from the other account (such as the SageMaker notebook IAM role).
B. A SageMaker notebook security group that allows access to Amazon S3. A
security group is a virtual firewall that controls the inbound and outbound traffic for
the SageMaker notebook instance. To allow the notebook instance to access the
S3 bucket, the security group must have a rule that allows outbound traffic to the
S3 endpoint on port 443 (HTTPS).
C. An IAM role that allows access to the specific S3 bucket. An IAM role is an
identity that can be assumed by the SageMaker notebook instance to access AWS
resources. The IAM role must have a policy that grants the necessary permissions
(such as s3:GetObject) to access the specific S3 bucket. The policy must also
include a condition that allows access to the CMK in the other account.
The following are not required or correct:
D. A permissive S3 bucket policy. A bucket policy is a resource-based policy that
defines who can access the S3 bucket and what actions they can perform. A
permissive bucket policy is not required and not recommended, as it can expose
the bucket to unauthorized access. A bucket policy should follow the principle of
least privilege and grant the minimum permissions necessary to the specific
principals that need access.
E. An S3 bucket owner that matches the notebook owner. The S3 bucket owner
and the notebook owner do not need to match, as long as the bucket owner grants
cross-account access to the notebook owner through the KMS key policy and the
bucket policy (if applicable).
F. A SegaMaker notebook subnet ACL that allow traffic to Amazon S3. A subnet
ACL is a network access control list that acts as an optional layer of security for
the SageMaker notebook instance’s subnet. A subnet ACL is not required to
access the S3 bucket, as the security group is sufficient to control the traffic.
However, if a subnet ACL is used, it must not block the traffic to the S3 endpoint.