Miscellaneous
- recordlinkage.index_split(index, chunks)
Function to split pandas.Index and pandas.MultiIndex objects.
Split
pandas.Index
andpandas.MultiIndex
objects into chunks. This function is based onnumpy.array_split()
.- Parameters:
index (pandas.Index, pandas.MultiIndex) – A pandas.Index or pandas.MultiIndex to split into chunks.
chunks (int) – The number of parts to split the index into.
- Returns:
list – A list with chunked pandas.Index or pandas.MultiIndex objects.
- recordlinkage.get_option(pat)
Retrieves the value of the specified option.
The available options with its descriptions:
- classification.return_typestr
The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairsstr
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
- Parameters:
pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.
- Returns:
result (the value of the option)
:raises OptionError : if no such option exists:
- recordlinkage.set_option(pat, value)
Sets the value of the specified option.
The available options with its descriptions:
- classification.return_typestr
The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairsstr
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
- Parameters:
pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.
value – new value of option.
- Returns:
None
- Raises:
OptionError if no such option exists –
- recordlinkage.reset_option(pat)
Reset one or more options to their default value.
Pass “all” as argument to reset all options.
The available options with its descriptions:
- classification.return_typestr
The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairsstr
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
- Parameters:
pat (str/regex) – If specified only options matching prefix* will be reset. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.
- Returns:
None
- recordlinkage.describe_option(pat, _print_desc=False)
Prints the description for one or more registered options.
Call with not arguments to get a listing for all registered options.
The available options with its descriptions:
- classification.return_typestr
The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairsstr
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
- Parameters:
- Returns:
None by default, the description(s) as a unicode string if _print_desc
is False