Entities and Generative Search in AGI.
What are Entities in Generative Search?
Imagine you have a big box of Lego blocks. These blocks are like "entities" in the computer world. Each block represents a piece of information, like "dog," "red," or "running."
Now, think about how you usually look for things on the internet. It's like trying to find a specific Lego block by its colour or shape. You type in words, and the computer tries to find exactly what you asked for.
But now, there's a new way to search. It's called "generative search." It's like having you own Lego builder. Instead of just finding the blocks you asked for, this builder can take those blocks and create whole new things with them.
So, if you ask about "a dog playing in a park," the Lego builder doesn't just show you pictures of dogs and parks. It can actually create a whole new story or picture about a dog playing in a park, even if that exact story or picture didn't exist before.
This new way of searching is exciting because it helps people find answers to more complicated questions. Instead of just finding simple facts, it can create new information by putting different facts together in smart ways - well, that's the ideal scenario.
It's like the difference between finding a picture of a red car and being able to ask, "What would this car look like if it could fly?" The new search can imagine and create an answer for you.
This is changing how we use computers and the internet. It's making it easier for people to find exactly what they're looking for, even if they're not sure how to explain it in simple words - and therein lies the main issue.
If generative search is making up stories, isn't there a danger that the output will be devoid of facts and composed of lies, which will mislead people?
This concern is one of the key challenges with generative search and AI technologies in general.
1. Potential for misinformation: There's a risk of generative search producing content that isn't factual. These systems can sometimes "hallucinate" or generate plausible-sounding but incorrect information. If you use ChatGPT (or similar) regularly, then I am sure you have furrowed your brow at some of the answers it has generated for you.
2. Not making up stories, but combining information: Ideally, generative search isn't supposed to simply make up stories. It's designed to combine existing knowledge in new ways. However, the quality of the output depends on the quality and accuracy of the data it's trained on.
3. Importance of reliable sources: The best generative search systems are trained on verified, factual information from reputable sources. They should also be designed to distinguish between facts and speculation.
4. Transparency is key: Good implementations of this technology should be clear about what is factual information and what is generated or speculative content.
5. Critical thinking still necessary: Users need to approach generative search results with the same critical thinking they'd apply to any information source. It's a tool to assist, not replace, human judgment.
6. Ongoing development: Researchers and companies are actively working on improving the accuracy and reliability of these systems, including developing better ways to cite sources and verify information.
7. Potential benefits: Despite the risks, when used responsibly, generative search can help synthesise information in helpful ways, potentially leading to new insights or making complex topics more accessible.
8. Ethical considerations: The developers and users of this technology need to consider the ethical implications and work to minimize potential harm from misinformation.
All AI generative search companies are competing to eliminate hallucinations, but have yet to reach that perfect state. They are employing several strategies to minimise hallucinations and improve the accuracy of their results.
1. Improved training data:
Companies are focusing on using high-quality, verified data sources for training. This includes curating datasets to remove inaccurate or low-quality information.
2. Fact-checking mechanisms:
Some systems are being designed with built-in fact-checking capabilities. These compare generated content against a database of known facts.
3. Source attribution:
Efforts are being made to enable AI models to cite their sources, allowing users to verify information. Perplexity is particularly good at source citation.
4. Uncertainty quantification:
Models are being developed to express levels of confidence in their outputs, indicating when they're less certain about information.
5. Retrieval-augmented generation:
This technique combines language models with the ability to retrieve and reference specific documents, reducing reliance on potentially flawed internal knowledge.
6. Multi-modal learning:
Incorporating different types of data (text, images, etc.) can help create more robust and accurate models.
7. Human-in-the-loop systems:
Some approaches involve human experts reviewing and correcting AI outputs to improve accuracy over time.
8. Adversarial training:
Models are challenged with difficult or tricky questions to improve their ability to avoid mistakes or hallucinations.
9. Constrained decoding:
This involves setting rules or restrictions on what the AI can generate, helping to keep outputs more factual and relevant.
10. Continual learning and updates:
Regular updates to the models with new, verified information help keep the knowledge base current and accurate.
These efforts are ongoing, and different companies may emphasise different approaches.
It's important to note that completely eliminating hallucinations is extremely challenging and may not be fully achievable with current technology. The goal is to minimise them as much as possible while being transparent about the limitations of the system, so before you publish anything that includes AI generative search, it is essential to edit the piece with your own eyes and to check that the source citations are real.
With guidance comes clarity and reassurance. Give us a call at Digital Advantage to understand your options - digitaladvantage.me
Comments