Segmentation fault on arm64 macOS while searching embeddings index #350

laurids-reichardt opened this issue Oct 2, 2022 · 5 comments


Searching an embeddings index, like demoed in the first txtai example, seems to lead to a segmentation fault on Apple Silicon hardware. This is the script I'm executing:

from txtai.embeddings import Embeddings

# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

data = [
    "US tops 5 million confirmed virus cases",
    "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
    "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
    "The National Park Service warns against sacrificing slower friends in a bear attack",
    "Maine man wins $1M from $25 lottery ticket",
    "Make huge profits without work, earn up to $100,000 a day",

print("%-20s %s" % ("Query", "Best Match"))
print("-" * 50)

for query in (
    "feel good story",
    "climate change",
    "public health story",
    "dishonest junk",
    # Get index of best section that best matches query
    uid = embeddings.similarity(query, data)[0][0]

    print("%-20s %s" % (query, data[uid]))

# Create an index for the list of text
embeddings.index([(uid, text, None) for uid, text in enumerate(data)])

print("%-20s %s" % ("Query", "Best Match"))
print("-" * 50)

# Run an embeddings search for each query
for query in (
    "feel good story",
    "climate change",
    "public health story",
    "dishonest junk",
    # Extract uid of first result
    # search result format: (uid, score)
    uid =, 1)[0][0]

    # Print text
    print("%-20s %s" % (query, data[uid]))


❯ python src/nlp/ 
Query                Best Match
feel good story      Maine man wins $1M from $25 lottery ticket
climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg
public health story  US tops 5 million confirmed virus cases
war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate
wildlife             The National Park Service warns against sacrificing slower friends in a bear attack
asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate
lucky                Maine man wins $1M from $25 lottery ticket
dishonest junk       Make huge profits without work, earn up to $100,000 a day
Query                Best Match
[1]    10762 segmentation fault  python src/nlp/


I'm happy to provide further information if needed.

Just confirmed that the example above runs fine on linux amd64 hardware with cuda support.

Thanks for giving txtai a try and taking the time to submit an issue!

Couple ideas:

  1. Try downgrading pytorch to 1.11. pytorch==1.12.x has had segfault issues on macOS.
pip install torch==1.11.0 torchvision==0.12.0
  1. Try using a different index backend. While Faiss is supported on Apple Silicon, I'm not sure how well supported it is.
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2", "backend": "hnsw"})

Unfortunately, I don't use Apple hardware, so it would be tough for me to debug/reproduce. txtai does have GitHub actions for macOS but it's x86-64 based. There is a long standing issue to add Apple Silicon support to GitHub Actions but it looks like it's currently unresolved.

Hi @davidmezzetti, thank you for developing and publishing this great library!

Indeed, changing the backend to hnsw worked out. Thanks for the tip!

Yes, unfortunately many ML libraries only partially support macOS and/or arm64. In most cases bigger experiments or production workloads will run on Linux with CUDA anyway, but it's always nice to be able to try out libraries on local hardware first. Great to see that it's possible to run txtai on Apple hardware!

Glad to hear it!

romainr commented Feb 20, 2023


FWIW did the pip from #350 (comment) withpip install txtai[similarity] and it seem to work now even without hnsw backend

