A Pipeline Technology For A New Livestock Industry: The PigLife Dataset
MICHAEL SWEENEY
URBANA, ILLINOIS
The livestock industry is changing quickly. Pig populations are increasing, farmers are decreasing, and tools are needed to address the widening gap.
“If you think of [PigLife] as a pipeline technology, we identified very early some of those barriers to technology development,” says Angela Green-Miller, the Labor Optimization for Livestock Co-Lead for AIFARMS and Associate Professor of Agricultural and Biological Engineering in the College of Agricultural, Consumer and Environmental Sciences at the University of Illinois Urbana-Champaign. “We need folks who work in early technology achieving solutions to those barriers before we can get to a tool that would go out to a farm.”
The lack of benchmark datasets and robust fundamental algorithms has hindered widespread commercial applications. Now, researchers everywhere have access to one of the most complete pig data sets there is. Ideally, the data would be used in applications such as computer vision – a method of affordable, non-invasive, and precise livestock monitoring of an individual animal’s entire life; an entire PigLife.
“The goal isn’t that we’re going to have computer vision fixing the problems; you still need people in the barn,” says Angela Green-Miller. “But there’s only so many hours in a day…having tools that can help those people know where to go, and what problems they’re looking for, would be a big pivot for the industry.”
The best part: researchers aren’t getting muddy in the process. The idea is to let any data scientist, even those who’ve never seen a pig in their life, try their hand at creating something useful.
“So people who know nothing about animal systems, but they know a lot about building models and a lot about creating tools and techniques, can help us,” Green-Miller says.
She says that PigLife is a crucial first step in a long process that will evolve our understanding from “what is with the animals,” to “how are the animals?”
The Problem
Gaps certainly exist for Piglife to step in and increase efficiency. The loss and death rate of pig production in the United States was around 10% in 2020, with a majority chalked up to unknown reasons; for an industry that cost farmers a total of 28.1 billion dollars in 2022, such a loss from undetected problems doesn’t work especially while livestock is increasing and the number of farmers is decreasing.
PigLife, under AIFARMS, is driven to address these pitfalls and ensure future agriculture is environmentally friendly, sustainable, affordable, and accessible to diverse farming communities.
The team wanted to tackle the whole problem and get a product out to market. When approaching the first step, however, they saw a massive lack of usable data, so they decided to scale down.
“The hardest thing to decide is what should be in phase one,” Green-Miller says. “Because we can’t do everything we would like to do, so what’s a step that would bring value?”
They also had to ask themselves what would be useful for others, not just themselves. They were making this for anyone and everyone.
“The point wasn’t to make a data set to duplicate what we were working on. The point was to make a data set to bring in new innovation. And so how do we make a data set that is valuable to people that we don’t know what they do?” Green-Miller adds.
The Data
Accessibility is important for bringing in fresh minds, but there’s no use in that if the data isn’t reliable. According to Ana Lučić, the Assistant Director for Data Management for AIFARMS, the team made sure first to raise its quality and give its data credibility in an environment where you’re not guaranteed either.
“You need to be able to know where your data came from,” Lučić says. “This is a very important question: how did it originate? Which we don’t always know with AI models.”
“We were trying to raise the standard in terms of how data is documented.”
Lučić and the team worked with the Office of Technology Management to add a custom license, a deliberate and responsible choice over Creative Commons that requires agreeing to terms that protect the university and adds to its legitimacy while keeping it open and downloadable— again, a move that isn’t close to universal, but PigLife hopes to encourage.
“Lots of data sets do not contain enough documentation,” Lučić says. “When you find something on the internet, very often there is no license which sets the terms of use, or information about data provenance, or an explanation on how to use the dataset.”
Once you’ve agreed to the terms, you’ll be invited into a resource that attempts to cover the drastically varying world of agricultural data.
Soil quality can differ from farm to farm and, as one can imagine, it doesn’t get easier as you analyze an entire country of differing rainfalls, temperatures, and pest populations.
However, “it’s not practical to build a tool for every individual farm. And right now, that’s mostly where the technology is,” Green- Miller says.
PigLife’s sheer number of datapoints encourages tools that can overcome variation, but it still doesn’t account for every situation a pig can get in, says Green-Miller, likely about half. Yet, researchers have reached out to admire that their data set is robust in a way that others aren’t.
Distinguishing itself even more from the typical data set, PigLife was created without a guiding question or a direct problem to solve. In this, you lose some scientific rigor in the traditional sense, but you gain enormous flexibility and long-term value, because the data can be repurposed as new questions and new modeling techniques emerge.
“There wasn’t a study. We made the decision early that this is a data set for modelers, not a data set for hypothesis-driven scientists. That’s a little different approach than what’s out there for most data sets,” Green-Miller says.
Today, making something so flexible requires it to be used by not only humans but also AI agents. Again, PigLife thought of this.
The set offers itself in the form of Croissant – a metadata format that is machine learning-ready and accessible to AI crawlers.
Lučić explains that this is all necessary; even the niceties of mentioning when it was collected, who inputted it, what the production cycles are, and how to cite it. The team looks to improve livestock management and lift the bar of data organization while doing so.
“We were trying to raise the standard in terms of how data is documented,” Lučić said. “Because we really believe that it is helpful not only for the people who are collecting the data, but also for anybody who will be using it.”
What’s next
So, the first step is complete. The data patiently awaits a proper application.
“PigLife is still pretty early. It’s not close to getting to a tool that we’re going to deploy on a commercial farm, but that’s what we’d like to be moving towards,” Green-Miller said.
And pigs are just the beginning. The researchers at Illinois are teaming up with colleagues at Tuskegee University to raise goat farming to the same level.
Challenges are imminent for developers looking to create a tool. Green-Miller explains the nuances when bridging the separated domains of data science and farming. There’s a needed transfer of knowledge between scientists and farmers, integrating agricultural terminology to improve explainability, and smoothing out the interaction between autonomous systems and humans.
Farming, traditionally a very physical occupation, integrating with AI-driven systems will prove to be a process in itself. Significant technical advances are required to enable and encourage farmers to adopt autonomous systems.
While seemingly down the line, the eventual product gives those in the livestock industry assurance that these problems are being addressed.
“I think the biggest direct benefit to our producers is awareness that they exist and have problems that we could potentially be solving with digital tools,” Green-Miller added. “It doesn’t really impact their day-to-day yet, but they’re very supportive of our ultimate end goal.”
As for the PigLife team, they have realized the need for AI-ready data and hope to use what they’ve learned to produce more novel agricultural data sets..
There are also hopes for open-world object detection and segmentation, and open-world hyperspectral data sets for phenotype prediction – advancements that would upgrade detection models from simply recognizing pigs’ movement to recognizing a pig’s physical traits and even identifying something it’s never seen before.
For the team, this is a feat of interdisciplinary collaboration as much as it is a feat of data science. A prime example that data scientists can interact with fields they’re strangers to, and domain-specific scientists can create revolutionary data sets.
“Even if we did nothing else with PigLife,” Green-Miller said. “It’s a great success story for how we came together as a team.” ∆
MICHAEL SWEENEY
UNIVERSITY OF ILLINOIS