Skip to content

Cannlytics Data Module

The cannlytics.data module is a digital toolbox for accessing, collecting, cleaning, augmenting, standardizing, saving, and analyzing cannabis data.

Data Management

The core data management tools found in cannlytics.data.data include:

Data Aggregation

Function Description
aggregate_datasets(directory, on='sample_id', how='left', replace='right', reverse=True, concat=False) Aggregate datasets. Leverages rmerge to combine each dataset in a given directory.

Data Cleaning

Function Description
find_first_value(string, breakpoints=None) Find the first value of a string, be it a digit, a 'ND', '<', or other specified breakpoints.
parse_data_block(div, tag='span') Parse an HTML data block into a dictionary.

Data Augmentation

Function Description
create_hash(public_key, private_key = 'cannlytics.eth') Create a hash (HMAC-SHA256) that is unique to the provided data, the public_key. The private_key can be used to sign your data, with the default being Cannlytics' public key, 'cannlytics.eth'.
create_sample_id(private_key, public_key, salt='') Create a hash to be used as a sample ID. The standard is to use: 1. private_key = producer 2. public_key = product_name 3. salt = date_tested

Data Saving

Function Description
write_to_worksheet(ws, values) Write data to an Excel Worksheet.

Cannabis Patent Data

With the cannlytics.data.patents submodule you can find and curate data for cannabis patents.

Function Description
search_patents(query, limit=50, details=False, pause=None, term='') Search for patents.
get_patent_details(data=None, patent_number=None, patent_url=None, user_agent=None, fields=None, search_field='patentNumber', search_fields='patentNumber', query='patentNumber',) Get details for a given patent, given it's patent number and URL.

Example

from cannlytics.data.patents import (
  get_patent_details,
  search_patents,
)

# Search for cannabis plant patents.
patents = search_patents('cannabis cultivar', limit=1000, term='TTL%2F')

# Get patent details.
patent = get_patent_details(
    patent_number='PP34051',
    patent_url='https://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=7&f=G&l=50&d=PTXT&p=1&S1=%22marijuana+plant%22&OS=%22marijuana+plant%22&RS=%22marijuana+plant%22',
)

Web Data

There are a number of web data tools in cannlytics.data.web, including:

Function Description
format_params(parameters, **kwargs) Format given keyword arguments HTTP request parameters.
get_page_metadata(url) Get the metadata of a web page
get_page_description(html) Get the description of a web page.
get_page_image(html, index=0) Get an image on a web page, the first image by default.
get_page_favicon(html, url='') Get the favicon from a web page.
get_page_theme_color(html) Get the theme color of a web page.
get_page_phone_number(html, response, index=0) Get the first phone number on a web page.
get_page_email(html, response) Get an email on a web page, the last email by default.