Items and Users Properties¶

Items and users meta-data is called “properties”. For instance an item might be represented as follows:

{
  "item_id": "57243",
  "category_id": 3,
  "genre": "drama",
  "tags": ["family", "sci-fi"],
  "price": 9.99,
  "enabled": true,
  "summary": "An eccentric yet compassionate extraterrestrial Time Lord zips through time and space [...]",
  "poster": "https://www.themoviedb.org/tv/57243-doctor-who.jpg"
}

Why and How?¶

Using rich properties gives two advantages:

it improves the recommendations, especially for both cold-start problems where the algorithm relies only on properties (such as Semantic Graph Embedding from genres and tags, or Deep Content Extraction from text and images)
it enables your client to dynamically filter the recommendations on items satisfying certain criteria (such as a price smaller than a threshold given at runtime, or a geo-location close to given coordinates, see Filtering on Item Property)

Like in a SQL database, properties must be defined before you can insert items or users with these properties. The API does not automatically create new properties if a new key is detected during an API request. This choice effectively prevents development errors as soon as they occur.

In most use case of the Crossing Minds API, you don’t need to upload and maintain your item catalog yourself. Instead it is more common to leverage a CDP integration (like Segment, mParticle, or Shopify), or share entire data files with your dedicated ML Engineer. Nevertheless for some use cases it is preferable that you maintain all the properties by calling the API endpoints. See Uploading and Maintaining An Item Catalog below.

Property Types¶

Item properties can be of various types. The available value_type may be found in the “Value Types” column of the following table:

Domain	Value Types	Kind	Filters	Examples	Example Values
integer	`int`, `int<NBITS>`	scalar	`=`, `<`	number of pages, year	`0`, `-5`, `12345`
integer	`uint`, `uint<NBITS>`	categorical scalar	`=`, `<`	category ID	`0`, `5`, `12345`
boolean	`bool`	scalar	`=`	is enabled	`true`, `false`
float	`float`, `float<NBITS>`	scalar	`<`	price	`3.14`, `9.99`
string	`unicode`, `unicode<NCHARS>`	categorical	`=`	UTF8 genre name	`"science-fiction"`, `"drama"`
bytes	`bytes`, `bytes<NCHARS>`	categorical	`=`	ASCII tag name, encrypted tag	`0x5906d464`
text	`text`	long text	`ft`	review, synopsis	`"An eccentric yet..."`
url	`image_url`	image		poster, screenshot	`"https://te.st/img.jpg"`

Notes:

Domains with = in “Filters” means they support the eq, neq, in, notin operators in recommendations filters
Domains with < in “Filters” means they support the lt, gt operators in recommendations filters
Domains with both = and < in “Filters” also support the lte, gte operators in recommendations filters
Domains with ft in “Filters” means they support the ftsearch operator in recommendations filters
Categorical domains contribute to Semantic Graph Embedding
Scalar domains contribute to Shallow Content Extraction
Long text and image domains contribute to Deep Content Extraction
In integer domains, valid <NBITS> are 8, 16, 32 and 64
In float domains, valid <NBITS> are 32 and 64
In string and bytes domains, <NCHARS> encodes the maximum size. It defaults to the maximum of 255 chars
In boolean domain, any value other than true, false, "true", "false"``or ``1, 0 will raise an exception.

Repeated Values¶

Any property may be “repeated”, meaning that a single item may have an array of many values for this property. This is typically the case with properties like “tags” or “genres”.

Properties with repeated=True also support recommendation filters. For most filter operators, an item with a repeated property satisfies a filter on this property if any of the repeated value satisfies the filter. See Filter Logic for the detailed logic of filtering on repeated values.

Array-Optimized Format (Optional)¶

In JSON, the repeated values are represented using a simple list.

When using a client supporting binary serialization of the data (not JSON), you can use an “array-optimized” format to represent repeated values of many items. Using this format will save you memory, CPU and network bandwidth, but it is more complicated. This array-optimized format requires to separate the item properties in two groups:

items, a single array to represent the non-repeated values, with at least the item ID;
items_m2m, a mapping from property name to arrays representing the repeated values.

The arrays in items_m2m store a collection of 2-tuples for each of the many-to-many relations. The first element item_index is the (0-based) index of the item with respect to items. The second element value_id is the property value.

For instance let’s take the following bulk of 4 items:

{
  "items": [
    {
      "item_id": "a",
      "price": 1.1,
      "tags": [],
      "genres": ["drama"]
    },
    {
      "item_id": "b",
      "price": 2.2,
      "tags": [1, 2, 3],
      "genres": ["drama", "comedy"]
    },
    {
      "item_id": "c",
      "price": 3.3,
      "tags": [1, 2],
      "genres": []
    },
    {
      "item_id": "d",
      "price": 4.4,
      "tags": [1],
      "genres": ["thriller", "romance"]
    }
  ]
}

The array-optimized format would be:

items

items_m2m->tags

items_m2m->genres

id	price
a	1.1
b	2.2
c	3.3
d	4.4

item_index	value_id
1	1
1	2
1	3
2	1
2	2
3	1

item_index	value_id
0	drama
1	drama
1	comedy
3	thriller
3	romance

Uploading and Maintaining An Item Catalog¶

In rare use cases, you cannot maintain your item catalog using only a CDP integration or sharing entire data files regularly. Then, you can use the API endpoints to upload and maintain the property values.

This setup is needed when it is important that newly created items are recommended immediately, without waiting for the next catalog sync. This use case is fairly rare, as typically items can be created in advance, and controlling which items are recommended is simply achieved with filters.

Use the API Endpoint POST items-properties/ to create new item properties.
Use the API Endpoints PUT items/<str:item_id>/properties/, PATCH items/<str:item_id>/properties/, and DELETE items/<str:item_id>/properties/ to replace, partial update, or delete property values of a single item.
Use the API Endpoints PUT items-bulk/properties/, PATCH items-bulk/properties/, and DELETE items-bulk/properties/ to replace, partial update, or delete items property values in bulk.

Sending large amount of data using HTTP calls can decrease performances. We suggest to follow these guidelines for better results:

use partial updates when possible, to only update what needs to be updated. Partial update is “shallow”, meaning it allows to send only a subset of key/value property mapping. But for repeated properties, you still send the full list of values (allowing to “delete” previous values from the list)
use the bulk operations when possible, to send many updates at the same time. Do not send huge requests to a single HTTP call, they would timeout. A good rule of thumb is to target a latency of single HTTP calls below one second, which would mean a few hundreds of KB approximately. Depending on the size of your values, it may be between ~50 items in the bulk to ~500 items per bulk.
use the default wait_for_completion=true when developing, to get error messages synchronously. When moving to production, use wait_for_completion=false so that our backend will move the update to a background queue, and immediately return an empty response.