data package#

Module contents#

Use this module to upload data to Edge Impulse.

class edgeimpulse.experimental.data.Sample(
data: BufferedIOBase,
filename: str | None = None,
category: Literal['training', 'testing', 'anomaly', 'split'] | None = 'split',
label: str | None = None,
bounding_boxes: Sequence[dict] | None = None,
metadata: dict | None = None,
sample_id: int | None = None,
structured_labels: Sequence[dict] | None = None,
)[source]#

Bases: object

Wrapper class for sample data, labels, and associated metadata.

Sample data should be contained in a file or file-like object, for example, as the return from open(…, “rb”). The upload_samples() function expects Sample objects as input.

filename#

Name to give the sample when stored in the Edge Impulse project

Type:

str

data#

IO stream of data to be read during the upload process. This can be a BytesIO object, such as the return from open(…, “rb”).

Type:

BufferedIOBase

category#

Which dataset to store your sample. The default, “split,” lets the Edge Impulse server randomly assign the location of your sample based on the project’s split ratio (default 80% training and 20% test).

Type:

Optional[Literal[“training”, “testing”, “anomaly”, “split”]]

label#

The label to assign to your sample for classification and regression tasks.

Type:

Optional[str]

bounding_boxes#

Array of dictionary objects that define the bounding boxes for a given sample (object detection projects only). See our image annotation guide for how to format bounding box dictionaries.

Type:

Optional[dict]

metadata#

Dictionary of optional metadata that you would like to include for your particular sample (example: {“source”: “microphone”, “timestamp”: “120”})

Type:

Optional[dict]

sample_id#

Unique ID of the sample. This is automatically assigned by the Edge Impulse server when the sample is uploaded. You can use this ID to retrieve the sample later. This value is ignored when uploading samples and should not be set by the user.

Type:

Optional[int]

structured_labels#

Array of dictionary objects that define the labels in this sample at various intervals. <https://edge-impulse.gitbook.io/docs/edge-impulse-studio/data-acquisition/multi-label>`_ to read more. Example: [{“label”: “noise”,”startIndex”: 0,”endIndex”: 5000}, {“label”: “water”,”startIndex”: 5000,”endIndex”: 10000}]

Type:

Optional[Sequence[dict]]

bounding_boxes: Sequence[dict] | None = None#
category: Literal['training', 'testing', 'anomaly', 'split'] | None = 'split'#
data: BufferedIOBase#
filename: str | None = None#
label: str | None = None#
metadata: dict | None = None#
sample_id: int | None = None#
structured_labels: Sequence[dict] | None = None#
edgeimpulse.experimental.data.delete_all_samples(
category: str | None = None,
api_key: str | None = None,
timeout_sec: float | None = None,
) GenericApiResponse | None[source]#

Delete all samples in a given category.

If category is set to None, all samples in the project are deleted.

Parameters:
  • category (Optional[str]) – Category (“training”, “testing”, “anomaly”) from which the samples should be deleted. Set to ‘None’ to delete all samples from all categories.

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the environment variable EI_API_KEY.

  • timeout_sec (Optional[float], optional) – Optional timeout (in seconds) for API calls.

Raises:

e – Unhandled exception from api

Returns:

API response

Return type:

Optional[GenericApiResponse]

edgeimpulse.experimental.data.delete_sample_by_id(
sample_id: int,
api_key: str | None = None,
timeout_sec: float | None = None,
) GenericApiResponse | None[source]#

Delete a particular sample from a project given the sample ID.

Parameters:
  • sample_id (id) – ID of the sample to delete

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the environment variable EI_API_KEY.

  • timeout_sec (Optional[float], optional) – Optional timeout (in seconds) for API calls.

Raises:

e – Unhandled exception from api

Returns:

API response, None if no sample is found

Return type:

Optional[GenericApiResponse]

Examples

# Example of filename that has been uploaded to Studio
filename = "my-image.01.png"

# Remove extension on the filename when querying the dataset in Studio
filename_no_ext = os.path.splitext(filename)[0]

# Get list of IDs that match the given sample filename
infos = ei.experimental.data.get_sample_ids(filename_no_ext)

# Delete the IDs
for info in infos:
    resp = ei.experimental.data.delete_sample_by_id(info.sample_id)
    if resp is None:
        logging.warning(f"Could not delete sample {filename_no_ext}")
edgeimpulse.experimental.data.delete_samples_by_filename(
filename: str,
category: str | None = None,
api_key: str | None = None,
timeout_sec: float | None = None,
) Tuple[GenericApiResponse] | None[source]#

Delete any samples from an Edge Impulse project that match the given filename.

Note: the filename argument must not include the original extension. For example, if you uploaded a file named my-image.01.png, you must provide the filename as my-image.01.

Parameters:
  • filename (str) – Filename of the sample to delete. You should not include any extension on the filename.

  • category (Optional[str]) – Category (“training”, “testing”, “anomaly”) from which the samples should be deleted. Set to ‘None’ to delete all samples from all categories.

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the environment variable EI_API_KEY.

  • timeout_sec (Optional[float], optional) – Optional timeout (in seconds) for API calls.

edgeimpulse.experimental.data.download_samples_by_ids(
sample_ids: int | Sequence[int],
api_key: str | None = None,
timeout_sec: float | None = None,
max_workers: int | None = None,
show_progress: bool | None = False,
pool_maxsize: int | None = 20,
pool_connections: int | None = 20,
) List[Sample][source]#

Download samples by their associated IDs from an Edge Impulse project.

Downloaded sample data is returned as a DownloadSample object, which contains the raw data in a BytesIO object along with associated metadata.

Important! All time series data is returned as a JSON file (in BytesIO format) with a timestamp column. This includes files originally uploaded as CSV, JSON, and CBOR. Edge Impulse Studio removes the timestamp column in any uploaded CSV files and computes an estimated sample rate. The timestamps are computed based on the sample rate, will always start at 0, and will be in milliseconds. These timestamps may not be the same as the original timestamps in the uploaded file.

Parameters:
  • sample_ids (Union[int, Sequence[int]]) – IDs of the samples to download

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the env var EI_API_KEY.

  • timeout_sec (float, optional) – Number of seconds to wait for profile job to complete on the server. None is considered “infinite timeout” and will wait forever.

  • max_workers (int, optional) – The maximum number of subprocesses to use when making concurrent requests. If None, the number of workers will be set to the number of processors on the machine multiplied by 5.

  • show_progress – Show progress bar while uploading samples.

  • pool_maxsize – (int, optional) Maxium size of the upload pool. Defaults to 20.

  • pool_connections – (int, optional) Maxium size of the pool connections. Defaults to 20.

Returns:

List of Sample objects with data and metadata as downloaded from

the Edge Impulse project. Will be an empty list [] if no samples with the matching IDs are found.

Return type:

List[Sample]

Example

sample = ei.data.download_samples_by_ids(12345)
print(sample)
edgeimpulse.experimental.data.get_filename_by_id(
sample_id: int,
api_key: str | None = None,
timeout_sec: float | None = None,
) str | None[source]#

Given an ID for a sample in a project, return the filename associated with that sample.

Note that while multiple samples can have the same filename, each sample has a unique sample ID that is provided by Studio when the sample is uploaded.

Parameters:
  • sample_id (int) – Sample ID to look up

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the environment variable EI_API_KEY.

  • timeout_sec (Optional[float], optional) – Optional timeout (in seconds) for API calls.

Raises:

e – Unhandled exception from api

Returns:

Filename (string) if sample is found. None if no sample is found

matching the ID given.

Return type:

Optional[str]

edgeimpulse.experimental.data.get_sample_ids(
filename: str | None = None,
category: str | None = None,
labels: str | None = None,
api_key: str | None = None,
num_workers: int | None = 4,
timeout_sec: float | None = None,
) List[SampleInfo][source]#

Get the sample IDs and filenames for all samples in a project, filtered by category, labels, and/or filename.

Note that filenames are given by the root of the filename when uploaded. For example, if you upload my-image.01.png, it will be stored in your project with a hash, such as my-image.01.png.4f262n1b.json. To find the ID(s) that match this sample, you must provide the argument filename=my-image.01. Notice the lack of extension and hash.

Because of the possibility for multiple samples (i.e. different sample IDs) with the same filename, we recommend providing unique filenames for your samples when uploading.

Parameters:
  • filename (Optional[str]) – Filename of the sample(s) (without extension or hash) to look up. Note that multiple samples can have the same filename. If no filename is given, the function will look for samples with any filename.

  • category (Optional[str]) – Category (“training”, “testing”, “anomaly”) to look in for your sample. If no category is given, the function will look in all possible categories.

  • labels (Optional[str]) – Label to look for in your sample. If no label is given, the function will look for samples with any label.

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the environment variable EI_API_KEY.

  • num_workers (Optional[int]) – Number of threads to use to make API calls. Defaults to 4.

  • timeout_sec (Optional[float], optional) – Optional timeout (in seconds) for API calls.

Raises:

e – Unhandled exception from api

Returns:

List of SampleInfo objects containing the sample ID,

filename, category, and label for each sample matching the criteria given.

Return type:

List[SampleInfo]

edgeimpulse.experimental.data.infer_category_and_label_from_filename(sample, file) None[source]#

Extract label and category information from the filename and assigns them to the sample object.

Files should look like this myfiles/training/wave.1.cbor where wave is label and training is the category.

Parameters:
  • sample (object) – The sample object to which label and category will be assigned.

  • file (str) – The filename from which label and category information will be extracted.

Returns:

None

edgeimpulse.experimental.data.numpy_timeseries_to_sample(
values,
sensors: Sequence[Sensor],
sample_rate_ms: int,
) Sample[source]#

Convert numpy values to a sample that can be uploaded to Edge Impulse.

Parameters:
  • values (array) – Numpy array containing the timeseries data. The shape should be (num_samples, time_point, num_sensors)

  • sensors (Sequence[Sensor]) – List of sensor objects representing the sensors used in the data.

  • sample_rate_ms (int) – Time interval in milliseconds between consecutive data points.

Returns:

Sample object that can be uploaded to Edge Impulse

Return type:

Sample

edgeimpulse.experimental.data.pandas_dataframe_to_sample(
df,
sample_rate_ms: int | None = None,
label: str | None = None,
filename: str | None = None,
axis_columns: List[str] | None = None,
metadata: dict | None = None,
category: Literal['training', 'testing', 'split'] = 'split',
) Sample[source]#

Convert a dataframe to a single sample. Can handle both timeseries and non-timeseries data.

In order to be inferred as timeseries it must have:

  • More than one row

  • A sample rate or an index from which the sample rate can be inferred
    • Therefore must be monotonically increasing

    • And int or a date

Parameters:
  • df (DataFrame) – The input DataFrame containing data.

  • sample_rate_ms (int) – The sampling rate of the time series data (in milliseconds).

  • label (str, optional) – The label for the sample. Default is None.

  • filename (str, optional) – The filename for the sample. Default is None.

  • axis_columns (List[str], optional) – List of column names representing axis if the data is multi-dimensional. Default is None.

  • metadata (dict, optional) – Dictionary containing metadata information for the sample. Default is None.

  • category (str or None, optional) – To which category this sample belongs (training/testing/split) default is spit.

Returns:

A sample object containing the data from the dataframe.

Return type:

Sample

edgeimpulse.experimental.data.upload_directory(
directory: str,
category: str | None = None,
label: str | None = None,
metadata: dict | None = None,
transform: callable | None = None,
) UploadSamplesResponse[source]#

Upload a directory of files to Edge Impulse.

The files can be in CBOR, JSON, image, or WAV file formats. You can read more about the different file formats accepted by the Edge Impulse ingestion service here:

https://docs.edgeimpulse.com/reference/ingestion-api

Parameters:
  • directory (str) – The path to the directory containing the files to upload

  • category (str) – Category for the samples (train or split)

  • label (str) – Label for the files

  • metadata (dict) – Metadata to add to the file (visible in studio)

  • transform (callable) – A function to manipulate the sample and properties before uploading

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:

FileNotFoundError – If the specified directory does not exist.

Examples

response = ei.experimental.data.upload_directory(directory="tests/sample_data/gestures")
self.assertEqual(len(response.successes), 8)
self.assertEqual(len(response.fails), 0)
edgeimpulse.experimental.data.upload_exported_dataset(
directory: str,
transform: callable | None = None,
) UploadSamplesResponse[source]#

Upload samples from a downloaded Edge Impulse dataset and preserving the info.labels information.

Use this when you’ve exported your data in the studio.

Parameters:
  • directory (str) – Path to the directory containing the dataset.

  • transform (callable) – A function to manipulate sample before uploading

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:

FileNotFoundError – If the labels file (info.labels) is not found in the specified directory.

edgeimpulse.experimental.data.upload_numpy(
data,
labels: Sequence[str],
sensors: Sequence[Sensor],
sample_rate_ms: int,
metadata: dict | None = None,
category: Literal['training', 'testing', 'split'] = 'split',
) UploadSamplesResponse[source]#

Upload numpy arrays as timeseries using the Edge Impulse data acquisition format.

Parameters:
  • data (array) – Numpy array containing the timeseries data. The shape should be (num_samples, time_point, num_sensors)

  • labels (Sequence[str]) – List of labels for the data samples can also be a numpy array.

  • sensors (Sequence[Sensor]) – List of Sensor objects representing the sensors used in the data.

  • sample_rate_ms (int) – Time interval in milliseconds between consecutive data points.

  • metadata (dict, optional) – Metadata for all samples being uploaded. Default is None.

  • category (str or None, optional) – Category or class label for the entire dataset. Default is split.

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:

ValueError – If the length of labels doesn’t match the number of samples or if the number of sensors doesn’t match the number of axis in the data.

Examples

import numpy as np
import edgeimpulse as ei
from ei.experimental.data import upload_numpy

ei.API_KEY = "your-api-key" # set your key

# Create 2 samples, each with 3 axes of accelerometer data
samples = np.array(
    [
        [  # sample 1
            [8.81, 0.03, 1.21],
            [9.83, 1.04, 1.27],
            [9.12, 0.03, 1.23],
            [9.14, 2.01, 1.25],
        ],
        [  # sample 2
            [8.81, 0.03, 1.21],
            [9.12, 0.03, 1.23],
            [9.14, 2.01, 1.25],
            [9.14, 2.01, 1.25],
        ],
    ]
)

# The labels for each sample
labels = ["up", "down"]

# The sensors used in the samples
sensors = [
    {"name": "accelX", "units": "ms/s"},
    {"name": "accelY", "units": "ms/s"},
    {"name": "accelZ", "units": "ms/s"},
]

# Upload samples to your Edge Impulse project
resp = upload_numpy(
    sample_rate_ms=100,
    data=samples,
    labels=labels,
    category="training",
    sensors=sensors,
)
print(resp)
edgeimpulse.experimental.data.upload_pandas_dataframe(
df,
feature_cols: List[str],
label_col: str | None = None,
category_col: str | None = None,
metadata_cols: List[str] | None = None,
) UploadSamplesResponse[source]#

Upload non-timeseries data to Edge Impulse where each dataframe row becomes a sample.

Parameters:
  • df (dataframe) – The DataFrame to be uploaded.

  • feature_cols (List[str]) – A list of column names containing features

  • label_col (str, optional) – The name of the column containing labels for the data.

  • category_col (str, optional) – The name of the column containing the category for the data.

  • metadata_cols (List[str], optional) – Optional list of column names containing metadata

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:

AttributeError – If the input object does not have a iterrows method.

Examples

import edgeimpulse as ei
ei.API_KEY = "your-api-key" # set your key

# Uncomment one of the following
# import pandas as pd
# import dask.dataframe as pd
# import polars as pd

# Construct non-time series data, where each row is a different sample
data = [
    ["desk", "training", "One", -9.81, 0.03, 0.21],
    ["field", "training", "Two", -9.56, 5.34, 1.21],
]
columns = ["loc", "category", "label", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)

# Upload non-time series DataFrame (with multiple samples) to the project
response = ei.experimental.data.upload_pandas_dataframe(
    df,
    feature_col=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_col=["loc"],
)
assert len(response.fails) == 0, "Could not upload some files"
edgeimpulse.experimental.data.upload_pandas_dataframe_wide(
df,
sample_rate_ms: int,
data_col_start: int | None = None,
label_col: str | None = None,
category_col: str | None = None,
metadata_cols: List[str] | None = None,
data_col_length: int | None = None,
data_axis_cols: List[str] | None = None,
) UploadSamplesResponse[source]#

Upload a dataframe to Edge Impulse where each of columns represent a value in the timeseries data and the rows become the individual samples.

Parameters:
  • df (DataFrame) – The input DataFrame containing time series data.

  • data_col_start (int) – The index of the column from which the time series data begins.

  • sample_rate_ms (int) – The sampling rate of the time series data (in milliseconds).

  • label_col (str, optional) – The column name containing labels for each time series. Default is None.

  • category_col (str, optional) – The column name containing the category for the data. Default is None.

  • metadata_cols (List[str], optional) – List of column names containing metadata information. Default is None.

  • data_col_length (int, optional) – The number of columns that represent a single time series. Default is None.

  • data_axis_cols (List[str], optional) – List of column names representing axis if the data is multi-dimensional. Default is None.

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:
  • AttributeError – If the input object does not have a iterrows method.

  • ValueError – If the data_col_length argument is not an integer or if the data_col_start argument is not an integer.

Examples

import edgeimpulse as ei
ei.API_KEY = "your-api-key" # set your key

# Uncomment one of the following
# import pandas as pd
# import dask.dataframe as pd
# import polars as pd

data = [
    [1, "idle", 0.8, 0.7, 0.8, 0.9, 0.8, 0.8, 0.7, 0.8],  # ...continued
    [2, "motion", 0.3, 0.9, 0.4, 0.6, 0.8, 0.9, 0.5, 0.4],  # ...continued
]

df = pd.DataFrame(
    data, columns=["id", "label", "0", "1", "2", "3", "4", "5", "6", "7"]
)

response = ei.experimental.data.upload_pandas_dataframe_wide(
    df,
    label_col="label",
    metadata_col=["id"],
    data_col_start=2,
    sample_rate_ms=100,
)
self.assertEqual(len(response.successes), 2)
self.assertEqual(len(response.fails), 0)
edgeimpulse.experimental.data.upload_pandas_dataframe_with_group(
df,
timestamp_col: str,
group_by: str,
feature_cols: List[str],
label_col: str | None = None,
category_col: str | None = None,
metadata_cols: List[str] | None = None,
) UploadSamplesResponse[source]#

Upload a dataframe where the rows contain multiple samples and timeseries data for those samples.

It uses a group_by in order to detect what timeseries value belongs to which sample.

Parameters:
  • df (dataframe) – The DataFrame to be uploaded.

  • timestamp_col (str) – The name of the column containing the timestamp for the data (in seconds).

  • group_by (str) – The name of the column containing the group for the data.

  • feature_cols (List[str]) – A list of column names containing features.

  • label_col (str, optional) – The name of the column containing labels for the data. Each group must have the same label. Default is None (derived from group name).

  • category_col (str, optional) – The name of the column containing the category for the data. Each group must have the same category. Default is None (random training/test split).

  • metadata_cols (List[str], optional) – A list of column names containing metadata information. Each group must have the same metadata. Default is None.

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Examples

import edgeimpulse as ei
ei.API_KEY = "your-api-key" # set your key

# Uncomment one of the following
# import pandas as pd
# import dask.dataframe as pd
# import polars as pd

# Create samples
sample_data = [
    ["desk", "sample 1", "training", "idle", 0, -9.81, 0.03, 0.21],
    ["desk", "sample 1", "training", "idle", 0.01, -9.83, 0.04, 0.27],
    ["desk", "sample 1", "training", "idle", 0.02, -9.12, 0.03, 0.23],
    ["desk", "sample 1", "training", "idle", 0.03, -9.14, 0.01, 0.25],
    ["field", "sample 2", "training", "wave", 0, -9.56, 5.34, 1.21],
    ["field", "sample 2", "training", "wave", 0.01, -9.43, 1.37, 1.27],
    ["field", "sample 2", "training", "wave", 0.02, -9.22, -4.03, 1.23],
    ["field", "sample 2", "training", "wave", 0.03, -9.50, -0.98, 1.25],
]
columns = ["loc", "sample_name", "category", "label", "timestamp", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(sample_data, columns=columns)

# Upload time series DataFrame (with multiple samples and multiple dimensions) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe_with_group(
    df,
    group_by="sample_name",
    timestamp_col="timestamp",
    feature_cols=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_cols=["loc"]
)
assert len(response.fails) == 0, "Could not upload some files"
edgeimpulse.experimental.data.upload_pandas_sample(
df,
label: str | None = None,
sample_rate_ms: int | None = None,
filename: str | None = None,
axis_columns: List[str] | None = None,
metadata: dict | None = None,
category: Literal['training', 'testing', 'split'] = 'split',
) UploadSamplesResponse[source]#

Upload a single dataframe sample.

Upload a single dataframe sample to Edge Impulse.

Parameters:
  • df (DataFrame) – The input DataFrame containing data.

  • label (str, optional) – The label for the sample. Default is None.

  • sample_rate_ms (int, optional) – The sampling rate of the time series data (in milliseconds)

  • filename (str, optional) – The filename for the sample. Default is None.

  • axis_columns (List[str], optional) – List of column names representing axis if the data is multi-dimensional. Default is None.

  • metadata (dict, optional) – Dictionary containing metadata information. Default is None.

  • category (str or None, optional) – Category or class label for the entire dataset. Default is split.

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:
  • AttributeError – If the input object does not have a reset_index method.

  • ValueError – If the axis_columns argument is not a list of strings or if the metadata argument is not a dictionary.

Examples

import edgeimpulse as ei
ei.API_KEY = "your-api-key" # set your key

# Uncomment one of the following
# import pandas as pd
# import dask.dataframe as pd
# import polars as pd

# Construct one dataframe for each sample (multidimensional, non-time series)
df_1 = pd.DataFrame([[-9.81, 0.03, 0.21]], columns=["accX", "accY", "accZ"])
df_2 = pd.DataFrame([[-9.56, 5.34, 1.21]], columns=["accX", "accY", "accZ"])

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload the first sample
ids = []
response = ei.experimental.data.upload_pandas_sample(
    df_1,
    label="One",
    filename="001",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"

# Upload the second sample
response = ei.experimental.data.upload_pandas_sample(
    df_2,
    label="Two",
    filename="002",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
edgeimpulse.experimental.data.upload_plain_directory(
directory: str,
category: str | None = None,
label: str | None = None,
metadata: dict | None = None,
transform: callable | None = None,
) UploadSamplesResponse[source]#

Upload a directory of files to Edge Impulse.

The samples can be in CBOR, JSON, image, or WAV file formats.

Parameters:
  • directory (str) – The path to the directory containing the files to upload.

  • category (str) – Category for the samples

  • label (str) – Label for the files

  • metadata (dict) – Metadata to add to the file (visible in studio)

  • transform (callable) – A function to manipulate the sample and properties before uploading

Returns:

A response object that contains the results of the upload.

Return type:

UploadSamplesResponse

Raises:

FileNotFoundError – If the specified directory does not exist.

Examples

response = ei.experimental.data.upload_directory(directory="tests/sample_data/gestures")
self.assertEqual(len(response.successes), 8)
self.assertEqual(len(response.fails), 0)
edgeimpulse.experimental.data.upload_samples(
samples: Sample | Sequence[Sample],
allow_duplicates: bool | None = False,
api_key: str | None = None,
timeout_sec: float | None = None,
max_workers: int | None = None,
show_progress: bool | None = False,
pool_maxsize: int | None = 20,
pool_connections: int | None = 20,
) UploadSamplesResponse[source]#

Upload one or more samples to an Edge Impulse project using the ingestion service.

Each sample must be wrapped in a Sample object, which contains metadata about that sample. Give this function a single Sample or a sequence of Sample objects to upload to your project. The data field of the Sample must be a raw binary stream, such as a BufferedIOBase object (which you can create with the open(…, “rb”) function).

Parameters:
  • samples (Union[Sample, Sequence[Sample]]) – One or more Sample objects that contain data for that sample along with associated metadata.

  • allow_duplicates (Optional[bool]) – Set to True to allow samples with the same data to be uploaded. If False, the ingestion service will perform a hash of the data and compare it to the hashes of the data already in the project. If a match is found, the service will reject the incoming sample (uploading for other samples will continue).

  • api_key (Optional[str]) – The API key for an Edge Impulse project. This can also be set via the module-level variable edgeimpulse.API_KEY, or the env var EI_API_KEY.

  • timeout_sec (Optional[float], optional) – Number of seconds to wait for an upload request to complete on the server. None is considered “infinite timeout” and will wait forever.

  • max_workers (Optional[int]) – The max number of workers to upload the samples. It should ideally be equal to the number of cores on your machine. If None, the number of workers will be automatically determined.

  • show_progress (Optional[bool]) – Show progress bar while uploading samples. Default is False.

  • pool_maxsize (Optional[int]) – The maximum number of connections to make in a single connection pool (for multithreaded uploads).

  • pool_connections (Optional[int]) – The maximum number of connections to cache for different hosts.

Returns:

A response object that contains the results of the upload. The

response object contains two tuples: the first tuple contains the samples that were successfully uploaded, and the second tuple contains the samples that failed to upload along with the error message.

Return type:

UploadSamplesResponse

Examples

# Create a dataset (with a single Sample)
samples = (
    Sample(
        filename="wave.01.csv",
        data=open("path/to/wave.01.csv", "rb"),
        category="split",
        label="wave",
    ),
)

# Upload samples and print responses
response = ei.data.upload_samples(samples)
print(response.successes)
print(response.fails)