Peque-NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extract intents, features and information.
For example: quiero conocer el ultimo blogpost de unity
Result: Timing -> latest, Technology -> unity, Intention -> search
- Feature extraction from text Agnostic algorithm: you can use SGD, MLNN, LLMs, Word2Vec, etc.
- 100% Free and Open source
- Chatbots, to get intention and extract features
- Search engines, get keywords and intention from a semantic info
- Data mining, classifying text and unstructured data without boilerplate
- Python 3.6+
Warning
pip installation coming soon
- Clone this repo
git clone git@github.com:HectorPulido/peque-nlu.git
- Install the requirements
pip install -r requirements.txt
- Use the library
from peque_nlu.intent_engines import SGDIntentEngine
from peque_nlu.intent_classifiers import ModelIntentClassifier
intent_engine = SGDIntentEngine("spanish")
model = ModelIntentClassifier("spanish", intent_engine)
model.fit(DATASET_PATH)
prediction = model.multiple_predict(
[
"Hola como te encuentras?",
"Quiero aprender sobre lo último de python",
"describeme usando un meme",
]
)
assert len(prediction) == 3
first_prediction = prediction[0]
assert "intent" in first_prediction
assert "probability" in first_prediction
assert "text" in first_prediction
assert "features" not in first_prediction
assert first_prediction["intent"] == "small_talk"
You need to provide to the algorithm before start, you can check this as base
{
"intents": {
"small_talk": [
"hola",
...
],
"fun_phrases": [
"eres gracioso",
...
],
"meme": [
"¿conoces algun buen meme?",
...
],
"thanks": [
"gracias",
...
]
},
"entities": {
"technology": [
"python",
...
],
"timing": [
"recient",
...
]
}
}
When you have your format ready, you can load and fit your dataset.
intent_engine = SGDIntentEngine("spanish")
model = ModelIntentClassifier("spanish", intent_engine)
model.fit(DATASET_PATH)
You can also save and load your models to reduce time and resources.
# Save
saver = PickleSaver()
saver.save(intent_engine, PICKLE_PATH)
# Load
intent_engine_loaded = SGDIntentEngine("spanish")
intent_engine_loaded = saver.load(PICKLE_PATH)
Then you can start to predict or extract features from a text
prediction = model.predict("quiero conocer el ultimo blogpost de unity")
Response:
{
"intent": "search",
"features": [
{
"word": "ultimo",
"entity": "timing",
"similarities": 1
},
{
"word": "otro_ejemplo",
"entity": "otra_entidad",
"similarities": 0.9
}
]
}
Your contributions are greatly appreciated! Please follow these steps:
- Fork the project
- Create your feature branch
git checkout -b feature/MyFeature
- Commit your changes
git commit -m "my cool feature"
- Push to the branch
git push origin feature/MyFeature
- Open a Pull Request
Every base code made by me is under the MIT license