Cannlytics Data Module
The cannlytics.data
module is a digital toolbox for accessing, collecting, cleaning, augmenting, standardizing, saving, and analyzing cannabis data.
Data Management
The core data management tools found in cannlytics.data.data
include:
Data Aggregation
Function | Description |
---|---|
aggregate_datasets(directory, on='sample_id', how='left', replace='right', reverse=True, concat=False) |
Aggregate datasets. Leverages rmerge to combine each dataset in a given directory. |
Data Cleaning
Function | Description |
---|---|
find_first_value(string, breakpoints=None) |
Find the first value of a string, be it a digit, a 'ND', '<', or other specified breakpoints. |
parse_data_block(div, tag='span') |
Parse an HTML data block into a dictionary. |
Data Augmentation
Function | Description |
---|---|
create_hash(public_key, private_key = 'cannlytics.eth') |
Create a hash (HMAC-SHA256) that is unique to the provided data, the public_key . The private_key can be used to sign your data, with the default being Cannlytics' public key, 'cannlytics.eth' . |
create_sample_id(private_key, public_key, salt='') |
Create a hash to be used as a sample ID. The standard is to use: 1. private_key = producer 2. public_key = product_name 3. salt = date_tested |
Data Saving
Function | Description |
---|---|
write_to_worksheet(ws, values) |
Write data to an Excel Worksheet. |
Cannabis Patent Data
With the cannlytics.data.patents
submodule you can find and curate data for cannabis patents.
Function | Description |
---|---|
search_patents(query, limit=50, details=False, pause=None, term='') |
Search for patents. |
get_patent_details(data=None, patent_number=None, patent_url=None, user_agent=None, fields=None, search_field='patentNumber', search_fields='patentNumber', query='patentNumber',) |
Get details for a given patent, given it's patent number and URL. |
Example
from cannlytics.data.patents import (
get_patent_details,
search_patents,
)
# Search for cannabis plant patents.
patents = search_patents('cannabis cultivar', limit=1000, term='TTL%2F')
# Get patent details.
patent = get_patent_details(
patent_number='PP34051',
patent_url='https://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=7&f=G&l=50&d=PTXT&p=1&S1=%22marijuana+plant%22&OS=%22marijuana+plant%22&RS=%22marijuana+plant%22',
)
Web Data
There are a number of web data tools in cannlytics.data.web
, including:
Function | Description |
---|---|
format_params(parameters, **kwargs) |
Format given keyword arguments HTTP request parameters. |
get_page_metadata(url) |
Get the metadata of a web page |
get_page_description(html) |
Get the description of a web page. |
get_page_image(html, index=0) |
Get an image on a web page, the first image by default. |
get_page_favicon(html, url='') |
Get the favicon from a web page. |
get_page_theme_color(html) |
Get the theme color of a web page. |
get_page_phone_number(html, response, index=0) |
Get the first phone number on a web page. |
get_page_email(html, response) |
Get an email on a web page, the last email by default. |