Sproutl Hackathon 2022

What is a Hackathon?

A “Hackathon” is a social event that brings together experts, computer programmers and other interested people to improve upon or build a new high-quality solution to a problem through cross-functional collaboration.

Why I Ran a Company Hackathon

I joined a startup called Sproutl in early 2021 whose mission was to democratise access to gardening for all by creating a curated e-commerce platform through which people could purchase gardening items and plants from almost any garden centre in the UK, even including Kew Gardens. We went on to raise a $9m seed round and the company scaled up to 45 employees by mid-2022.

Given the increased size of the company, the level of contribution each individual employee could make at the idea stage of new features began to decrease.

Each employee in the company was carefully hired to not only be highly skilled but also to have significant proactiveness and ability to detect opportunities. To ensure this wasn't wasted, I proposed that we hold a 2-day Hackathon at the end of the 2nd quarter 2022. This was accepted by leadership and I was put in charge of organising things on the condition that I wouldn't let it impact my day-to-day work. I took this as a bit of a challenge and so was extra motivated to ensure things went swimmingly!

How I Put Things Together (High Level)

Made a decision that this should be a 2 day event.
Anyone can propose a problem/idea for the Hackathon.
There will be a 10 day period for people to propose ideas in a table in a Notion page I set up
Someone has to volunteer as the “functional expert” for an idea/problem, they would have to be available at least part time during the Hackathon.
Ideas are all presented to those takinig part in the Hackathon once the proposal period is over.
People choose the 3 ideas they would most like to work on in a Google Form.
Cross-functional teams of 2-6 members are formed by event organisers based on this.
There will be snacks and drinks available throughout the days as well as dinner (a poll was done to determine preferences).
A social at a pub at the end of the second day.

What Were the Criteria for Idea/Problem Proposal?

All ideas have to be:

Sproutl related
Must be value adding can be done in ~2 days

What should a proposal include?

Idea summary
Any datasets/articles/images that can help or link to these
Idea details

Example ideas:

Sproutl AR - show how plants/pots would look in your veranda/garden
Catalogue enrichment using AI - auto-recommend product category and other details

Deliverables:

Shippable/almost shippable solution
Learnings (especially if problem wasn’t solvable)
3-5 minute presentation

Judging Criteria:

Value add to Sproutl (shippable/almost shippable solution)
Quality, clarity and engagement of presentation
Useful learnings gained during the Hackathon

Independent judging of presentations by Andrew Robb, the former COO of Farfetch

My Hackathon Idea

Due to the high growth rate of the startup, we were gaining access to new SKUs and items to sell on the e-commerce platform faster than we were capable of doing all the admin for these items. Some of this admin included writing nice yet accurate descriptions for them, determining the item type as well as attributes such as height, width, diameter, fragility level and weight amongst others.

My proposal was to train AI models using the vast amounts of data we had already gathered from the 20,000 SKUs already deployed to partially or fully automate some of the admin required in the SKU deployment process.

Introducting E-LLA by Team Creative BrAIns

How We Determined The Exact Problem to Tackle

We analysed domain knowledge regarding each task involved in getting a new SKU live.
Two of our team mates, Ella and Lucie were critical in the SKU creation process and their knowledge was crucial.
After determining which tasks were most automatable within the time we had whilst also providing significant value to Sproutl, we decided to proceed with item description generation.
Ella had written the majority of item descriptions manually up to this point so we would be training our model based on her writing style, hence the AI model would be called E-LLA (Ecommerce Labelling & Linguistics Automation)

How Was E-LLA Built?

We determined key item attributes used in manual description creation.
We retrieved and cleaned these from our Data Warehouse.
These were then reformatted into a form accepted by OpenAI.
We trained OpenAI’s GPT-3 Davinci model using data from 631 pots (costing $10.13 of my OpenAI credits).

What Were the Results?
A model which could successfully generate item descriptions based on the existing Sproutl writing style. Example Generated description for "Ivory Fence & Balcony Hanging Planter":

Create an elegant balcony space with the ivory coloured Fence & Balcony Hanging Planter. 

Perfect for housing any plants and blooms, it also has two hooks at the base so you can easily drape any hanging baskets or containers. 
Made from tough powder-coated steel, this planter has the versatility to be used both indoors and out.

Next Steps?

Auto-retrieve data from new partially created SKUs and populate the PIM with a draft AI-generated description to be approved or edited by the Catalogue team.
Auto-determine the item categories as well. This would require pretty much the same dataset however requires a new classification model so requires an additional couple hours of engineering time.

This project ended up winning the Hackathon:

What is GPT-3 (High-Level)

Tech-Speak Explanation
It’s an autoregressive language model that uses deep learning to produce human-like text.

What Does That Mean?
GPT-3 is a smart AI model specialised to produce human text.

Project Limitations

The AI is not deterministic, it can generate different descriptions each time for the same product and once in a while these aren’t up to our standards. We made the AI generate 3 descriptions and pick the best one of them to mitigate this.
We initially tried training using all our data, which was about 19,000 items. This would have taken considerable amount of time to train and cost ~$300. We eventually chose to train using only ¼ of our pots.
We trained the model to be able to successfully determine new lines, apostrophes and the like however it returns these in unicode ’ so these need to be parsed or the model retrained without empty lines before being loaded to our PIM.
The model currently isn't optimised to generate SEO friendly content.

Hackathon Feedback