Cannlytics Utilities

The cannlytics.utils module contains constants and general utility functions for working with cannabis data. Due to the use of zoneinfo for managing time, Python 3.9+ is recommended.

Constants

There are a number of useful constants in the cannlytics.utils.constants submodule that you can use for standardizing data.

Constant	Description
`ANALYSES`	A map of encountered analyses to their standardized analysis.
`ANALYTES`	A map of encountered analytes to their standardized analyte.
`STANDARD_ANALYSES`	Standard analysis key map.
`STANDARD_FIELDS`	A map of encountered fields to their standardized field.
`STANDARD_UNITS`	A map of standard units by analysis to use when no units are obtainable.
`PRODUCT_TYPES`	A map of encountered product types to their standardized product type.
`STRAINS`	A map of encountered strains to their standardized strain name.
`CODINGS`	Standard value codings.
`DECARB`	Cannabinoid decarboxylation rate.
`DEFAULT_HEADERS`	Default headers to use for HTTP requests, because we are AI and should not be treated as a bot.
`RANDOM_STRING_CHARS`	Random characters to use in password generation.
`states`	A map of state abbreviations to state names.
`state_names`	A map of state names to state abbreviations.
`state_time_zones`	A map of state abbreviations to timezone.

Utility Functions

String Utilities

Function	Description
`camelcase(string)`	Turn a given string to CamelCase.
`camel_to_snake(string)`	Turn a camel-case string to a snake-case string. This function handles CamelCase better than `snake_case`. The function does not do well with all caps, e.g. "APP_ID".
`kebab_case(string)`	Turn a string into a kebab-case string.
`format_billions(value)`	Format a number in billions.
`format_millions(value)`	Format a number in millions.
`format_thousands(value)`	Format a number in thousands.
`get_keywords(string)`	Get keywords for a given string.
`get_random_string(length, allowed_chars=RANDOM_STRING_CHARS)`	Return a securely generated random string.
`sentence_case(string)`	Format a string as a sentence.
`snake_case(string)`	Turn a given string to snake case. Handles CamelCase, replaces known special characters with preferred namespaces, replaces spaces with underscores, and removes all other nuisance characters.
`strip_whitespace(string)`	Strip whitespace from a string.

Number Utilities

Function	Description
`convert_to_numeric(string, strip=False)`	Convert a string to numeric, optionally replacing non-numeric characters.

List Utilities

Function	Description
`sandwich_list(a)`	Create a range that cycles from start to the end to the middle.
`sorted_nicely(a)`	Sort the given iterable in the way that humans expect.
`split_list(a, at_index=None)`	Split a list in half or at a given index.

Dictionary Utilities

Function	Description
`clean_dictionary(data, function=snake_case)`	Format dictionary keys with given function, snake case by default.
`clean_nested_dictionary(data, function=snake_case)`	Format nested (at most 2 levels) dictionary keys with a given function, snake case by default.
`remove_dict_fields(data, fields)`	Remove multiple keys from a dictionary.
`remove_dict_nulls(data)`	Return a shallow copy of a dictionary with all `None` values excluded.
`update_dict(context, function=camel_to_snake, **kwargs)`	Update dictionary with keyword arguments.

DataFrame Utilities

Function	Description
`clean_column_strings(data, columns)`	Clean the column names of a given DataFrame.
`end_of_period_timeseries(data, period='M')`	Convert a DataFrame from beginning-of-the-period to end-of-the-period timeseries.
`nonzero_columns(data)`	Return the non-zero column names of a DataFrame.
`nonzero_rows(data)`	Return the non-zero row keys of a DataFrame.
`combine_columns(data, new_key, old_key, drop=True)`	Combine two numeric columns of a DataFrame.
`reorder_columns(data, columns)`	Re-order a DataFrame given a specific order of columns. Remaining columns will be appended to the end of the DataFrame.
`reverse_dataframe(data)`	Reverse the ordering of a DataFrame.
`sum_columns(data, new_key, columns, drop=True)`	Sum multiple numeric columns of a DataFrame.
`rmerge(left, right, **kwargs)`	Perform a merge using pandas with optional removal of overlapping column names not associated with the join.
`set_training_period(series, date_start, date_end)`	Helper function to restrict a series to the desired training time period.
`to_excel_with_style(data, file_name, index=False, sheet_name='Sheet1', style=None)`	Save a DataFrame to Excel with no style.

Time Utilities

Function	Description
`convert_month_year_to_date(x)`	Convert a month, year series to datetime. E.g. `'April 2022'`.
`end_of_month(value)`	Format a datetime as an ISO formatted date at the end of the month.
`end_of_year(value)`	Format a datetime as an ISO formatted date at the end of the year.
`format_iso_date(date, sep='/')`	Format a human-written date into an ISO formatted date.
`get_timestamp(date, past=0, future=0, zone='utc)`	Get an ISO formatted timestamp.
`months_elapsed(start, end)`	Calculate the months elapsed between two times, returning 0 if a negative time span.

File Utilities

Function	Description
`decode_pdf(data, destination)`	Save an base-64 encoded string as a PDF.
`encode_pdf(filename)`	Open a PDF file in binary mode.
`get_directory_files(target_dir, file_type)`	Get all of the files of a specified type in a given directory.
`get_number_of_lines(file_name, encoding='utf-16', errors='ignore')`	Read the number of lines in a large file.
`download_file_from_url(url, destination='', ext='')`	Download a file from a URL to a given directory.
`unzip_files(zip_dir, extension='.zip')`	Unzip all files in a specified folder. Alternatively, pass a .zip file to extract that file.