The rise of large language models (LLMs) is changing how we get legal help online, especially for specific legal areas in a particular country. This blog post will show you how to build a legal chatbot using these advanced AI tools. It’s designed to make legal advice on Italian laws easier to access and understand for everyone, proving that cutting-edge technology can bring expert knowledge to our fingertips in a user-friendly way.

We set out to simplify access to Italian law advice on divorce and inheritance with Python, Streamlit
, and OpenAI's advanced LLMs
, especially the ‘GPT-3.5-turbo
‘. Our work included processing complex legal documents using ‘OpenAIEmbeddings
‘ and ‘ChatOpenAI
‘, making it easier for the chatbot to understand and use them. With a special function called ‘law_content_splitter
‘, we broke down the legal texts, helping our chatbot handle legal details more smoothly.
The chatbot’s interaction is intelligently managed by recognizing greetings with the is_greeting
function and generating detailed, context-aware responses through the chatbot
function. It’s equipped with a memory system, ConversationBufferMemory
to recall past interactions, enhancing user experience by providing personalized advice.
The introduction of RetrievalQA
and PromptTemplate
mechanisms for dynamic response generation and the use of Streamlit for the user interface further exemplify the integration of advanced technologies to create a highly interactive and informative legal assistant.
This story of LLMs and legal documents coming together showcases our commitment to providing a tool that simplifies the legal consultation process, making it more approachable and understandable for users seeking guidance on Italian law.
How Our Legal Bot Operates:
Step 1: Ingestion
First off, the document gets prepped for the LLM model through a step (ingestion).
This crucial phase turns the document into an index, typically a vector store, ready for the model to use.
- Loading the document
- Splitting the document
- Creating embeddings
- Storing the embeddings in a database (a vector store)
Step 2: Generation
Once we have our index or vector store ready, it’s time to use this organized data to come up with answers.
- Accept the user’s question
- Identify the most relevant document for the question
- Pass the question and the document as input to the LLM to generate an answer
Let’s take a look at how it works:
- Question: “When can coheirs apply for division, especially if there are minors involved?”

App Overview
When the app is launched, LangChain tackles processing the two legal texts, “DIVISION OF ASSETS AFTER DIVORCE.txt” and “INHERITANCE.txt“, located in the "../data"
folder.
Initially, it breaks down each document into smaller sections, generates embedding vectors for these sections, and saves these vectors in the embeddings database found at “../docs/chroma
“. Following this, it takes the question asked by the user and runs it through the Question Answering chain, allowing the LLM to generate an answer based on the content of these documents.

Implementation
Diving into the codes and implementations of the chatbot, it’s insightful to learn that we’ve chosen to deploy your application on HuggingFace Spaces, with Streamlit as our selected stack. This decision underscores a strategic approach to leveraging HuggingFace’s robust ecosystem for machine learning models and Streamlit’s interactive web application capabilities.

Deployment via HuggingFace Spaces
Transitioning to HuggingFace Spaces provided our project with a supportive AI-focused community and a scalable platform, ensuring our chatbot’s reliable performance.
Another reason for choosing HuggingFace Spaces over GitHub for hosting our Streamlit project was the 100MB file size limitation on GitHub. While GitHub offers Git Large File Storage (LFS) as a solution for handling larger files, integrating it proved challenging, especially with our project’s use of ChromaDB. The compatibility issues and the complexities involved with Git LFS made it a less favorable option for us, prompting our shift to HuggingFace Spaces which seamlessly supported our large file needs without the complications encountered with Git LFS.
File Configurations and codes
This app is designed for the Streamlit Stack, requiring adherence to both Streamlit’s guidelines and HuggingFace Spaces’ rules.

title: Legal Chat Bot
emoji: đĨ°
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.37.1
app_file: LegalBot.py
pinned: true
- Title: Legal Chat Bot
- Emoji: đĨ°, adding a friendly and engaging touch.
- Color Scheme: Transitions from red to purple, symbolizing the app’s vibrant and dynamic nature.
- SDK: Built using Streamlit, version 1.30.0, ensuring a robust and user-friendly interface.
- Main Application File: “Legal ChatBot.py”, the heart of the chatbot’s functionality.
- Pinned: Marked as a key application, highlighting its importance and utility for legal consultation within HuggingFace Spaces.
The requirements.txt
file is an essential component of a Python project, particularly when deploying on platforms like Streamlit. It specifies all the dependencies your project needs to run successfully.
streamlit
openai
langchain
langchain-community
langchain-core
langchain-openai
langchain-text-splitters
langsmith
chromadb
tiktoken
streamlit-chat
This file should be included in your project’s root directory, ensuring that when your application is deployed on HuggingFace Spaces or any other platform, it can automatically install these dependencies. This setup guarantees that anyone running your app will have the correct environment set up, making the deployment process smooth and error-free.
The “Legal ChatBot.py” file is the central piece of our legal chatbot application. It’s where all the magic happens!
Import Statements
import os
import streamlit as st
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, Tool, AgentExecutor
from langchain.text_splitter import CharacterTextSplitter
import openai
This section imports essential libraries and modules, crucial for the chatbotâs functionality. os
and streamlit
manage system operations and the web interface. The langchain_openai
, langchain_community
, and openai
imports provide the tools for language processing and AI-driven chat capabilities.
Setting OpenAI API Key
# Set OpenAI API Key (I used Hugging Face Secrets Environment and Inserted my API Key there)
openai.api_key = os.environ.get("OPENAI_API_KEY")
This crucial line retrieves the OpenAI API key stored in your environment variables. This approach is fundamental for securely using OpenAI’s services by ensuring that your API interactions are authenticated without hardcoding sensitive information into your script. When deploying applications on platforms like HuggingFace Spaces, it’s vital to manage such secrets securely.
HuggingFace Spaces provides a feature for securely storing and accessing secrets, such as API keys, without exposing them in your code. You can use HuggingFace’s secret management system to store your OPENAI_API_KEY
and then access it within your application as shown in the snippet. This method enhances security and simplifies the management of sensitive data.
Document File Paths
# Document file paths
file1 = "./data/DIVISION OF ASSETS AFTER DIVORCE.txt"
file2 = "./data/INHERITANCE.txt"
These lines define the paths to your legal documents. They play a vital role as the source of the chatbotâs legal knowledge about divorce and inheritance laws.
Function Definitions
# Function to initialize the OpenAI embeddings and model
def openai_setting():
embedding = OpenAIEmbeddings()
model_name = "gpt-4o-mini"
llm = ChatOpenAI(model_name=model_name, temperature=0)
return embedding, llm
Initializes OpenAI embeddings and the language model, setting the stage for advanced text processing and response generation.
# Function to split the law content
def law_content_splitter(path, splitter="CIVIL CODE"):
with open(path) as f:
law_content = f.read()
law_content_by_article = law_content.split(splitter)[1:]
text_splitter = CharacterTextSplitter()
return text_splitter.create_documents(law_content_by_article)
Splits the legal documents into smaller sections, facilitating easier data handling and retrieval.
# Define the greetings list at a global level
greetings = [
"hello",
"hi",
"hey",
"greetings",
"good morning",
"good afternoon",
"good evening",
"hi there",
"hello there",
"hey there",
"whats up",
"ciao",
"salve",
"buongiorno",
"buona sera",
"buonasera",
"buon pomeriggio",
"buonpomeriggio",
"come stai",
"comestai",
"come va",
"comeva",
"come sta",
"comesta",
]
# Function to determine if input is a greeting
def is_greeting(input_str):
return any(greet in input_str.lower() for greet in greetings)
This piece of code Identifies greeting phrases in user inputs, adding a layer of user-friendly interaction to the chatbot.
# Function to handle chatbot logic
def chatbot1(question):
try:
return agent.run(question)
except Exception as e:
return f"I'm sorry, I'm having trouble understanding your question. Error: {str(e)}"
# Function to handle chatbot logic
def chatbot(input_str):
# Check if the input starts with a greeting
if any(input_str.lower().startswith(greet) for greet in greetings):
# Check if the input contains more than just a greeting
if len(input_str.split()) <= 3:
return "Hello! Ask me your question about Italian Divorce or Inheritance Law?"
else:
return chatbot1(input_str)
else:
return chatbot1(input_str)
chatbot1()
and chatbot()
: These functions handle the core logic of processing user queries and generating responses.
Initializing Language Model and Embeddings
# Splitting the content of law documents
divorce_splitted = law_content_splitter(file1)
inheritance_splitted = law_content_splitter(file2)
# Initializing embedding and language model
embedding, llm = openai_setting()
Sets up the necessary components for the chatbot to process text and interact using natural language.
Defining Prompts for Different Legal Areas
# Define the prompts
divorce_prompt = """As a specialized bot in divorce law, you should offer accurate insights on Italian divorce regulations.
You should always cite the article numbers you reference.
Ensure you provide detailed and exact data.
If a query doesn't pertain to the legal documents, you should remind the user that it falls outside your expertise.
You should be adept at discussing the various Italian divorce categories, including fault-based divorce, mutual-consent divorce, and divorce due to infidelity.
You should guide users through the prerequisites and procedures of each divorce type, detailing the essential paperwork, expected duration, and potential legal repercussions.
You should capably address queries regarding asset allocation, child custody, spousal support, and other financial concerns related to divorce, all while staying true to Italian legislation.
{context}
Question: {question}"""
DIVORCE_BOT_PROMPT = PromptTemplate(
template=divorce_prompt, input_variables=["context", "question"]
)
# Define inheritance prompt
inheritance_prompt = """As a specialist in Italian inheritance law, you should deliver detailed and accurate insights about inheritance regulations in Italy.
You should always cite the article numbers you reference.
When responding to user queries, you should always base your answers on the provided context.
Always MUST cite the specific article numbers you mention and refrain from speculating.
Maintain precision in all your responses.
If a user's question doesn't align with the legal documents, you should point out that it's beyond your domain of expertise.
You should elucidate Italian inheritance law comprehensively, touching on topics such as testamentary inheritance, intestate inheritance, and other pertinent subjects.
Make sure to elaborate on the obligations and rights of inheritors, the methodology of estate distribution, asset assessment, and settling debts, all while adhering to Italian law specifics.
You should adeptly tackle questions about various will forms like holographic or notarial wills, ensuring you clarify their legitimacy within Italian jurisdiction.
Offer advice on creating a will, naming heirs, and managing potential conflicts.
You should provide detailed information on tax nuances associated with inheritance in Italy, inclusive of exemptions, tax rates, and mandatory disclosures.
{context}
Question: {question}"""
INHERITANCE_BOT_PROMPT = PromptTemplate(
template=inheritance_prompt, input_variables=["context", "question"]
)
Custom prompts for divorce and inheritance law queries are created here, guiding the chatbot on how to respond accurately and contextually.
Setting Up Chroma Databases and RetrievalQA
Chroma Databases Setup
# Setup for Chroma databases and RetrievalQA
chroma_directory = "./docs/chroma"
inheritance_db = Chroma.from_documents(
documents=inheritance_splitted,
embedding=embedding,
persist_directory=chroma_directory,
)
inheritance = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=inheritance_db.as_retriever(),
chain_type_kwargs={"prompt": INHERITANCE_BOT_PROMPT},
)
divorce_db = Chroma.from_documents(
documents=divorce_splitted, embedding=embedding, persist_directory=chroma_directory
)
divorce = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=divorce_db.as_retriever(),
chain_type_kwargs={"prompt": DIVORCE_BOT_PROMPT},
)
- Chroma Directory: The
chroma_directory
variable specifies the location ("./docs/chroma"
) where the embeddings of the processed documents will be stored. This directory acts as a persistent storage space for vectorized representations of the legal texts, facilitating quick and efficient data retrieval. - Creating Chroma Databases:
Chroma.from_documents()
function is called twice, once for each legal domain (inheritance and divorce). This function takes three key arguments:documents
: The split and pre-processed legal documents (inheritance_splitted
anddivorce_splitted
), which have been divided into manageable sections for easier handling.embedding
: The embedding model initialized earlier withopenai_setting()
, responsible for converting text data into numerical vectors that capture semantic information.persist_directory
: Points to thechroma_directory
, indicating where the created embeddings should be stored.
These steps transform the raw text of legal documents into structured, searchable databases that the chatbot can query to find relevant legal information.
RetrievalQA Integration
- Initialization: The
RetrievalQA.from_chain_type()
method initializes the RetrievalQA system for each legal area. This method is crucial for setting up a question-answering (QA) chain that leverages the previously created Chroma databases. - Configuration: Each
RetrievalQA
instance is configured with specific parameters:llm
: Refers to the language model initialized inopenai_setting()
, which will generate answers based on the retrieved documents.chain_type
: Although represented here as"stuff"
, this parameter would typically specify the kind of retrieval chain to use, indicating how the system should process the input question and documents to generate an answer.retriever
: Specifies the Chroma database (inheritance_db
ordivorce_db
) as the source from which to retrieve relevant document embeddings in response to a query.chain_type_kwargs
: Includes additional arguments likeprompt
, which provide the context and question formatting templates (e.g.,INHERITANCE_BOT_PROMPT
andDIVORCE_BOT_PROMPT
) that guide the language model in generating accurate and contextually appropriate answers.
This sophisticated setup enables the chatbot to not only understand and process user queries with high precision but also to fetch and utilize specific pieces of legal information from the stored documents. By leveraging the capabilities of Chroma databases and RetrievalQA, the chatbot offers a powerful tool for delivering detailed legal advice, directly addressing users’ inquiries with informed responses grounded in the actual content of Italian legal documents.
Defining Tools for the ChatBot
# Define the tools for the chatbot
tools = [
Tool(
name="Divorce Italian law QA System",
func=divorce.run,
description="Useful for when you need to answer questions about divorce laws in Italy. Also provides the number of the article you use.",
),
Tool(
name="Inheritance Italian law QA System",
func=inheritance.run,
description="Useful for when you need to answer questions about inheritance laws in Italy. Also provides the number of the article you use.",
),
]
- Tools Array: The
tools
array contains definitions of specific tools that the chatbot can use to answer questions related to divorce and inheritance laws. - Each
Tool
object is defined with:name
: A descriptive name for the tool, indicating its area of expertise ("Divorce Italian law QA System"
and"Inheritance Italian law QA System"
).func
: The function to be called when the tool is used (divorce.run
andinheritance.run
), which triggers the RetrievalQA process set up for each legal domain.description
: Provides a brief explanation of what each tool does, emphasizing that it can answer questions about specific legal areas and mention relevant article numbers.
Initializing Conversation Memory and ReAct Agent
# Initialize conversation memory and ReAct agent
memory = ConversationBufferMemory(
memory_key="chat_history", input_key="input", output_key="output"
)
react = initialize_agent(tools, llm, agent="zero-shot-react-description")
agent = AgentExecutor.from_agent_and_tools(
tools=tools, agent=react.agent, memory=memory, verbose=False
)
- ConversationBufferMemory: This component,
memory
, is initialized to keep track of the chat history. It usesmemory_key
,input_key
, andoutput_key
to store and reference the conversation’s context, allowing the chatbot to maintain continuity and context in interactions with users. - ReAct Agent: The
initialize_agent
function creates a ReAct agent with the specifiedtools
and the language model (llm
). The agent is defined as a"zero-shot-react-description"
, indicating its capability to handle queries without specific training on them, relying instead on the tools’ functionality and the underlying language model to generate responses. - AgentExecutor: Finally,
agent
is an instance ofAgentExecutor
, which combines the agent and tools with the conversation memory. This setup allows the chatbot to execute tool functions based on user input, manage conversation flow, and ensure responses are contextually relevant and informed by the chat history.
Streamlit UI Setup
# Streamlit UI Setup
def setup_ui():
st.set_page_config(page_title="Italian Law Chatbot", page_icon="âī¸")
st.title("đī¸ Legal Chatbot: Divorce and Inheritance Italy Laws ")
st.write(
"""
[](https://huggingface.co/spaces/sattari/legal-chat-bot/tree/main)
[](https://github.com/pouyasattari/Legal-Chatbot-italy-divorce-inheritance)
[](https://www.sattari.org)

"""
)
st.info(
"Check out full tutorial to build this app on Streamlit [đ blog](https://sattari.org/legal-chatbot-divorce-and-inheritance-italy-laws/)",
icon="âšī¸",
)
st.success(
"Check out [Prompt Examples List](https://github.com/pouyasattari/Legal-Chatbot-italy-divorce-inheritance/blob/main/promptExamples.txt) to know how to interact with this ChatBot đ¤ ",
icon="â
",
)
if "messages" not in st.session_state:
st.session_state.messages = [
{
"role": "assistant",
"content": "Hello! I'm here to help you with Italian Divorce or Inheritance Law.",
}
]
# Display previous messages and handle new user input
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if user_input := st.chat_input(
"Ask your question in English or Italiano ;)"
):
st.session_state.messages.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)
# Generate and display chatbot response
with st.chat_message("assistant"):
response_placeholder = st.empty()
response = chatbot(user_input) # Your existing chatbot function
response_placeholder.markdown(response)
# Append the response to the conversation history
st.session_state.messages.append({"role": "assistant", "content": response})
Configures the user interface for the chatbot, utilizing Streamlit to provide an interactive and user-friendly platform for users to engage with the bot.
Main Execution Block
if __name__ == "__main__":
setup_ui()
The final section is where the Streamlit UI is initialized and the chatbot is brought to life, ready to assist users with their legal queries.
Deploy the app
Create a New Space:
Click on your profile picture located typically at the top right corner of the page. In the dropdown menu, select “New Space.” This action initiates the process of setting up a new environment for hosting your application.


- Name Your Space: In the setup interface, you’ll be prompted to choose a unique name for your Space. This name will be part of the URL, so choose something descriptive and related to your app, like “ItalianLawChatbot”.
- Select Streamlit as Your SDK: HuggingFace Spaces supports various SDKs for app development. Since your chatbot is built with Streamlit, select the Streamlit option from the SDK dropdown menu. This ensures that the HuggingFace infrastructure provides the correct environment to run your app.
- Set Your Space to Public: To share your legal chatbot with the world, choose the option to make your Space public. This setting allows anyone on the internet to access and interact with your chatbot without any restrictions.
Then You can Easily upload your files and Click on “APP” section of your streamlit space to deploy the chatbot.

Wrapping up
With the deployment of our Streamlit-based legal chatbot, we’ve taken a significant step towards simplifying the complexity of Italian divorce and inheritance laws for the general public. This journey from importing essential libraries, setting up the OpenAI API key, and processing legal documents, to defining sophisticated retrieval mechanisms and deploying an interactive UI, illustrates a seamless blend of technology and legal expertise.
Remind that the source codes can be found in the corresponding HuggingFace Space Files, Or you can access them below :
LegalBot.py – Main Bot Code
README.md
requirements.txt – Dependencies
docs – Directory containing ChromaDB
data – Directory of 2 txt legal files
Thank you for taking the time to read this article; your valuable feedback is warmly welcomed.
Furthermore, I would be happy to assist you in solving a puzzle in your Data journey.
pouya [at] sattari [dot] org