Cannabis Licenses
Table of Contents
- Table of Contents
- Dataset Description
- Dataset Summary
- Dataset Structure
- Data Instances
- Data Fields
- Data Splits
- Dataset Creation
- Curation Rationale
- Source Data
- Data Collection and Normalization
- Personal and Sensitive Information
- Considerations for Using the Data
- Social Impact of Dataset
- Discussion of Biases
- Other Known Limitations
- Additional Information
- Dataset Curators
- License
- Citation
- Contributions
Dataset Description
- Homepage: cannlytics/cannlytics
- Repository: https://huggingface.co/datasets/cannlytics/cannabis_licenses
- Point of Contact: dev@cannlytics.com
Dataset Summary
Cannabis Licenses is a collection of cannabis license data for each state with permitted adult-use cannabis. The dataset also includes a sub-dataset, all
, that includes all licenses.
Dataset Structure
The dataset is partitioned into 18 subsets for each state and the aggregate.
State | Code | Status |
---|---|---|
All | all |
✅ |
Alaska | ak |
✅ |
Arizona | az |
✅ |
California | ca |
✅ |
Colorado | co |
✅ |
Connecticut | ct |
✅ |
Delaware | md |
✅ |
Illinois | il |
✅ |
Maine | me |
✅ |
Maryland | md |
⚠️ Under development |
Massachusetts | ma |
✅ |
Michigan | mi |
✅ |
Missouri | mo |
✅ |
Montana | mt |
✅ |
Nevada | nv |
✅ |
New Jersey | nj |
✅ |
New Mexico | nm |
✅ |
New York | ny |
⚠️ Under development |
Oregon | or |
✅ |
Rhode Island | ri |
✅ |
Vermont | vt |
✅ |
Virginia | va |
⏳ Expected 2024 |
Washington | wa |
✅ |
The following states have issued medical cannabis licenses, but are not (yet) included in the dataset:
- Alabama
- Arkansas
- District of Columbia (D.C.)
- Florida
- Kentucky (2024)
- Louisiana
- Minnesota
- Mississippi
- New Hampshire
- North Dakota
- Ohio
- Oklahoma
- Pennsylvania
- South Dakota
- Utah
- West Virginia
Data Instances
You can load the licenses for each state. For example:
from datasets import load_dataset
# Get the licenses for a specific state.
dataset = load_dataset('cannlytics/cannabis_licenses', 'all')
data = dataset['data']
Data Fields
Below is a non-exhaustive list of fields, used to standardize the various data that are encountered, that you may expect to find for each observation.
Field | Example | Description |
---|---|---|
id |
"1046" |
A state-unique ID for the license. |
license_number |
"C10-0000423-LIC" |
A unique license number. |
license_status |
"Active" |
The status of the license. Only licenses that are active are included. |
license_status_date |
"2022-04-20T00:00" |
The date the status was assigned, an ISO-formatted date if present. |
license_term |
"Provisional" |
The term for the license. |
license_type |
"Commercial - Retailer" |
The type of business license. |
license_designation |
"Adult-Use and Medicinal" |
A state-specific classification for the license. |
issue_date |
"2019-07-15T00:00:00" |
An issue date for the license, an ISO-formatted date if present. |
expiration_date |
"2023-07-14T00:00:00" |
An expiration date for the license, an ISO-formatted date if present. |
licensing_authority_id |
"BCC" |
A unique ID for the state licensing authority. |
licensing_authority |
"Bureau of Cannabis Control (BCC)" |
The state licensing authority. |
business_legal_name |
"Movocan" |
The legal name of the business that owns the license. |
business_dba_name |
"Movocan" |
The name the license is doing business as. |
business_owner_name |
"redacted" |
The name of the owner of the license. |
business_structure |
"Corporation" |
The structure of the business that owns the license. |
activity |
"Pending Inspection" |
Any relevant license activity. |
premise_street_address |
"1632 Gateway Rd" |
The street address of the business. |
premise_city |
"Calexico" |
The city of the business. |
premise_state |
"CA" |
The state abbreviation of the business. |
premise_county |
"Imperial" |
The county of the business. |
premise_zip_code |
"92231" |
The zip code of the business. |
business_email |
"redacted@gmail.com" |
The business email of the license. |
business_phone |
"(555) 555-5555" |
The business phone of the license. |
business_website |
"cannlytics.com" |
The business website of the license. |
parcel_number |
"A42" |
An ID for the business location. |
premise_latitude |
32.69035693 |
The latitude of the business. |
premise_longitude |
-115.38987552 |
The longitude of the business. |
data_refreshed_date |
"2022-09-21T12:16:33.3866667" |
An ISO-formatted time when the license data was updated. |
Data Splits
The data is split into subsets by state. You can retrieve all licenses by requesting the all
subset.
from datasets import load_dataset
# Get all cannabis licenses.
dataset = load_dataset('cannlytics/cannabis_licenses', 'all')
data = dataset['data']
Dataset Creation
Curation Rationale
Data about organizations operating in the cannabis industry for each state is valuable for research.
Source Data
Data Collection and Normalization
In the algorithms
directory, you can find the algorithms used for data collection. You can use these algorithms to recreate the dataset. First, you will need to clone the repository:
git clone https://huggingface.co/datasets/cannlytics/cannabis_licenses
You can then install the algorithm Python (3.9+) requirements:
cd cannabis_licenses
pip install -r requirements.txt
Then you can run all of the data-collection algorithms:
python algorithms/main.py
Or you can run each algorithm individually. For example:
python algorithms/get_licenses_ny.py
Personal and Sensitive Information
This dataset includes names of individuals, public addresses, and contact information for cannabis licensees. It is important to take care to use these data points in a legal manner.
Considerations for Using the Data
Social Impact of Dataset
Arguably, there is substantial social impact that could result from the study of permitted adult-use cannabis, therefore, researchers and data consumers alike should take the utmost care in the use of this dataset.
Discussion of Biases
Cannlytics is a for-profit data and analytics company that primarily serves cannabis businesses. The data are not randomly collected and thus sampling bias should be taken into consideration.
Other Known Limitations
The data is for adult-use cannabis licenses. It would be valuable to include medical cannabis licenses too.
Additional Information
Dataset Curators
Curated by 🔥Cannlytics
contact@cannlytics.com
License
Copyright (c) 2022-2023 Cannlytics and the Cannabis Data Science Team
The files associated with this dataset are licensed under a
Creative Commons Attribution 4.0 International license.
You can share, copy and modify this dataset so long as you give
appropriate credit, provide a link to the CC BY license, and
indicate if changes were made, but you may not do so in a way
that suggests the rights holder has endorsed you or your use of
the dataset. Note that further permission may be required for
any content within the dataset that is identified as belonging
to a third party.
Citation
Please cite the following if you use the code examples in your research:
@misc{cannlytics2023,
title={Cannabis Data Science},
author={Skeate, Keegan and O'Sullivan-Sutherland, Candace},
journal={https://github.com/cannlytics/cannabis-data-science},
year={2023}
}
Contributions
Thanks to 🔥Cannlytics, @candy-o, @hcadeaux, @keeganskeate, and the entire Cannabis Data Science Team for their contributions.