Items and Users Properties

Items and users meta-data is called “properties”. For instance an item might be represented as follows:

{
  "item_id": "57243",
  "category_id": 3,
  "genre": "drama",
  "tags": ["family", "sci-fi"],
  "price": 9.99,
  "summary": "An eccentric yet compassionate extraterrestrial Time Lord zips through time and space [...]",
  "poster": "https://www.themoviedb.org/tv/57243-doctor-who.jpg"
}

Why and How?

Using rich properties gives two advantages:

  • it improves the recommendations, especially for both cold-start problems where the algorithm relies only on properties (such as Semantic Graph Embedding from genres and tags, or Deep Content Extraction from text and images)

  • it enables your client to dynamically filter the recommendations on items satisfying certain criteria (such as a price smaller than a threshold given at runtime, or a geo-location close to given coordinates, see Filtering on Item Property)

Like in a SQL database, properties must be defined before you can insert items or users with these properties. The API does not automatically create new properties if a new key is detected during an API request. This choice effectively prevents development errors as soon as they occur.

Use the API Endpoint POST items-properties/ to create new item properties.

Property Types

Item properties can be of various types. The available value_type may be found in the “Value Types” column of the following table:

Domain

Value Types

Kind

Filters

Examples

Example Values

integer

int, int<NBITS>

scalar

=, <

number of pages, year

0, -5, 12345

integer

uint,

uint<NBITS>

categorical scalar

=, <

category ID

0, 5, 12345

float

float, float<NBITS>

scalar

<

price

3.14, 9.99

string

unicode, unicode<NCHARS>

categorical

=

UTF8 genre name

"science-fiction", "drama"

bytes

bytes, bytes<NCHARS>

categorical

=

ASCII tag name, encrypted tag

0x5906d464

text

text

long text

review, synopsis

"An eccentric yet..."

url

image_url

image

poster, screenshot

"https://te.st/img.jpg"

Notes:

  • Domains with = in “Filters” means they support the eq, neq, in, notin operators in recommendations filters

  • Domains with < in “Filters” means they support the lt, gt operators in recommendations filters

  • Domains with both = and < in “Filters” also support the lte, gte operators in recommendations filters

  • Categorical domains contribute to Semantic Graph Embedding

  • Scalar domains contribute to Shallow Content Extraction

  • Long text and image domains contribute to Deep Content Extraction

  • In integer domains, valid <NBITS> are 8, 16, 32 and 64

  • In float domains, valid <NBITS> are 32 and 64

  • In string and bytes domains, <NCHARS> encodes the maximum size. It defaults to the maximum of 255 chars

Repeated Values

Any property may be “repeated”, meaning that a single item may have an array of many values for this property. This is typically the case with properties like “tags” or “genres”.

Properties with repeated=True also support recommendation filters. For most filter operators, an item with a repeated property satisfies a filter on this property if any of the repeated value satisfies the filter. See Filter Logic for the detailed logic of filtering on repeated values.

In JSON, the repeated values are represented using a simple list.

When using a client supporting binary serialization of the data (not JSON), we use an “array-optimized” format to represent repeated values of many items. Using this format will save you memory, CPU and network bandwidth, but it is more complicated. This array-optimized format requires to separate the item properties in two groups:

  • items, a single array to represent the non-repeated values, with at least the item ID;

  • items_m2m, a mapping from property name to arrays representing the repeated values.

The arrays in items_m2m store a collection of 2-tuples for each of the many-to-many relations. The first element item_index is the (0-based) index of the item with respect to items. The second element value_id is the property value.

For instance let’s take the following bulk of 4 items:

{
  "items": [
    {
      "item_id": "a",
      "price": 1.1,
      "tags": [],
      "genres": ["drama"]
    },
    {
      "item_id": "b",
      "price": 2.2,
      "tags": [1, 2, 3],
      "genres": ["drama", "comedy"]
    },
    {
      "item_id": "c",
      "price": 3.3,
      "tags": [1, 2],
      "genres": []
    },
    {
      "item_id": "d",
      "price": 4.4,
      "tags": [1],
      "genres": ["thriller", "romance"]
    }
  ]
}

The array-optimized format would be:

items

items_m2m->tags

items_m2m->genres

id

price

a

1.1

b

2.2

c

3.3

d

4.4

item_index

value_id

1

1

1

2

1

3

2

1

2

2

3

1

item_index

value_id

0

drama

1

drama

1

comedy

3

thriller

3

romance