API

class AutoML(output_folder, automl_id='AlphaD3M', container_runtime='docker', resource_folder=None, grpc_port=None, verbose=False)

Create/instantiate an AutoML object

Parameters
  • output_folder – Path to the output directory

  • automl_id – AutoML system name to be used. AutoML systems available are: ‘AlphaD3M’, ‘AutonML’. Currently only AlphaD3M is available for the container_runtime=’pypi’ option

  • resource_folder – Path to the directory where the resources are stored. This is needed only for some primitives that use pre-trained models, databases ,etc.

  • container_runtime – The container runtime to use, can be ‘docker’, ‘singularity’, ‘pypi’, or ‘local’

  • grpc_port – Port to be used by GRPC

  • verbose – Whether or not to show all the logs from AutoML systems

search_pipelines(dataset, time_bound, time_bound_run=5, target=None, metric=None, task_keywords=None, method='holdout', stratified=False, shuffle=True, folds=10, train_ratio=0.7, random_seed=0, exclude_primitives=None, include_primitives=None, **kwargs)

Perform the search of pipelines

Parameters
  • dataset – Path to dataset. It supports CSV file, D3M dataset, OpenML, and Sklearn datasets

  • time_bound – Limit time in minutes to perform the search

  • time_bound_run – Limit time in minutes to score a pipeline

  • target – Column name of the potential target variable for a problem

  • metric – The provided metrics are the following: hammingLoss, accuracy, objectDetectionAP, rocAucMicro, f1Macro, meanSquaredError, f1, jaccardSimilarityScore, normalizedMutualInformation, rocAuc, f1Micro, hitsAtK, meanAbsoluteError, rocAucMacro, rSquared, recall, meanReciprocalRank, precision, precisionAtTopK, rootMeanSquaredError

  • task_keywords – A list of keywords that capture the nature of the machine learning task. The keywords that can be combined to describe the task are the following: tabular, nested, multiLabel, video, linkPrediction, multivariate, graphMatching, forecasting, classification, graph, semiSupervised, text, timeSeries, clustering, collaborativeFiltering, univariate, missingMetadata, remoteSensing, multiClass, regression, multiGraph, lupi, relational, audio, grouped, objectDetection, vertexNomination, communityDetection, geospatial, image, overlapping, nonOverlapping, speech, vertexClassification, binary

  • method – Method to score the pipeline: holdout, cross_validation

  • stratified – Whether or not to split the data using a stratified strategy

  • shuffle – Whether or not to shuffle the data before splitting

  • folds – the seed used by the random number generator

  • train_ratio – Represent the proportion of the dataset to include in the train split

  • random_seed – The number seed used by the random generator

  • exclude_primitives – List of primitive’s names to be excluded in the search space. If None, all the primitives will be used in the search

  • include_primitives – List of primitive’s names to be included in the search space. If None, all the primitives will be used in the search

  • kwargs – Different arguments for problem’s settings (e.g. pos_label for binary problems using F1)

train(pipeline_id, expose_outputs=None)

Train a model using an specific ML pipeline

Parameters
  • pipeline_id – Pipeline id

  • expose_outputs – The output of the pipeline steps. If None, it doesn’t expose any output of the steps. If str, should be ‘all’ to shows the output of each step in the pipeline, If list, it should contain the ids of the steps, e.g. ‘steps.2.produce’

Returns

An id of the fitted pipeline with/without the pipeline step outputs

test(pipeline_id, test_dataset, expose_outputs=None, calculate_confidence=False)

Test a model

Parameters
  • pipeline_id – The id of a fitted pipeline

  • test_dataset – Path to dataset. It supports D3M dataset, and CSV file

  • expose_outputs – The output of the pipeline steps. If None, it doesn’t expose any output of the steps. If str, should be ‘all’ to shows the output of each step in the pipeline, If list, it should contain the ids of the steps, e.g. ‘steps.2.produce’

  • calculate_confidence – Whether or not to return the confidence instead of the predictions

Returns

A dataframe that contains the predictions with/without the pipeline step outputs

score(pipeline_id, test_dataset)

Compute a proper score of the model

Parameters
  • pipeline_id – The id of a pipeline or a Pipeline object

  • test_dataset – Path to dataset. It supports D3M dataset, and CSV file

Returns

A tuple holding metric name and score value

save_pipeline(pipeline_id, output_folder)

Save a pipeline on disk

Parameters
  • pipeline_id – The id of the pipeline to be saved

  • output_folder – Path to the folder where the pipeline will be saved

load_pipeline(pipeline_path)

Load a previous saved pipeline

Parameters

pipeline_path – Path to the folder where the pipeline is saved

get_best_pipeline_id()

Get the id of the best pipeline

Returns

The id of the best pipeline

list_primitives()

Get a list of primitives used by the AutoML system

Returns

List of primitives used by the AutoML system

create_pipelineprofiler_inputs(test_dataset=None, source_name=None)

Create an proper input supported by PipelineProfiler based on the pipelines generated by an AutoML system

Parameters
  • test_dataset – Path to dataset. If None it will use the search scores, otherwise will score the pipelines over the passed dataset

  • source_name – Name of the pipeline source. If None it will use the AutoML id

Returns

List of pipelines in the PipelineProfiler input format

create_textanalizer_inputs(dataset, text_column, label_column, positive_label=1, negative_label=0)

Create an proper input supported by VisualTextAnalyzer

Parameters
  • dataset – Path to dataset. It supports D3M dataset, and CSV file

  • text_column – Name of the column that contains the texts

  • label_column – Name of the column that contains the classes

  • positive_label – Label for the positive class

  • negative_label – Label for the negative class

export_pipeline_code(pipeline_id, ipython_cell=True)

Converts a Pipeline Description to an executable Python script

Parameters
  • pipeline_id – Pipeline id

  • ipython_cell – Whether or not to show the Python code in a Jupyter Notebook cell

end_session()

This safely ends session in D3M interface

plot_leaderboard()

Plot pipelines’ leaderboard

plot_summary_dataset(dataset, text_column=None)

Plot histograms of the dataset

Parameters
  • dataset – Path to dataset. It supports D3M dataset, and CSV file

  • text_column – Name of the column that contains the texts. Only needed for D3M dataset that has collections

plot_comparison_pipelines(test_dataset=None, source_name=None, precomputed_pipelines=None)

Plot PipelineProfiler visualization

Parameters
  • test_dataset – Path to dataset. If None it will use the search scores, otherwise will score the pipelines over the passed dataset

  • source_name – Name of the pipeline source. If None it will use the AutoML id

  • precomputed_pipelines – If not None, it loads pipelines previously computed

plot_text_analysis(dataset=None, text_column=None, label_column=None, positive_label=1, negative_label=0, precomputed_data=None)

Plot a visualization for text datasets

Parameters
  • dataset – Path to dataset. It supports D3M dataset, and CSV file

  • text_column – Name of the column that contains the texts

  • label_column – Name of the column that contains the classes

  • positive_label – Label for the positive class

  • negative_label – Label for the negative class

  • precomputed_data – If not None, it loads words/named entities previously computed

plot_text_explanation(model_id, instance_text, text_column, label_column, num_features=5, top_labels=1)

Plot a LIME visualization for model explanation

Parameters
  • model_id – Model id

  • instance_text – Text to be explained

  • text_column – Name of the column that contains the texts

  • label_column – Name of the column that contains the classes

  • num_features – Maximum number of features present in the explanation

  • top_labels – Number of labels with highest prediction probabilities to use in the explanations

static add_new_automl(automl_id, docker_image_url)

Add a new AutoML system that is not already defined in the D3M Interface. It can also be a different version of a pre-existing AutoML (however it must be added with a different name)

Parameters
  • automl_id – A id to identify the new AutoML

  • docker_image_url – The docker image url of the new AutoML