-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Experimental AI Integration in QualCoder #875
Comments
Hello Kai. I have seen your video and will download and try your code. Thank you very much for this, it looks like a great addition to QC functionality. Your programming skills are excellent :) as you have really understood the QC code and implemented the AI gpt4 which is currently way beyond my current skills. (Lots to read and try to understand). I do package the software as an executable for Windows and Ubuntu, using pyinstaller. Many end users prefer this as it is easy for them to use QC. (This is also why I package the icons and language files as base64 - this worked better for me than trying to package data files within pyinstaller, the spec file I don't use, I think it was historical stored on the main code page - but I see in your fork this has been modified for pyinstaller use - so another thing for me to try out). My concern is, would it still package up in this way nicely for those users who cannot use the command line installation methods. I presume yes, but need to test. And I'm glad you (and others) like using QualCoder. with regards Colin |
I think it might be best to release a 3.5 version without AI very soon. So that the more recent features can be out there 'in the wild'. Then incorporate the AI in the subsequent 3.6. |
@AndrzejWawa @amru39 Ah, it seems that you don't have access to GPT-4. If we follow the link provided (https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4), we find the following disclaimer: Really annoying, sorry for that. Why does OpenAI give new users free credits if they can't use them on their state of the art model? If you try, please report back in this fixes the problem, thanks. |
Yes, I would consider my AI functionality an experimental feature at the moment. Let's wait with the integration in the main version until it improves a bit.
Yes, it would be very nice to add binaries. I've created a binary + installer for Windows. My Linux skills are not good enough to do the same for Ubuntu. Having macOS binaries would also be very nice. Thank you Colin for providing and maintaining this great open source project with so much effort. It's really a shame that we have so little open source stuff in the field of qualitative data analysis. |
I've also created an issue on my fork about the "InvalidRequestError": kaixxx#1 |
It started working once I made the payment (USD 5). Will try using AI search for a bunch of articles I am reviewing to see how it works and how many credits it uses. |
OK. Well I think the plan for now is, I will try and update the translation files- EDIT - I will not update translation files. I am finding on Windows the two translation app just do not seem to work - xgettext and QtLingust. |
I tested the AI and I think it works quite good. A cauple of searches it costed me about 0,40$, it is quite ok, I think. In general, for me the functionality is ready for implementation to the formal release. But, of course, there could be further improvements I suggested in issues for QC AI project. |
Sorry for being so quiet over the last couple of weeks. I had a little winter break, but was also working on things in the background:
Other than that, I was always ready to fix bugs and problems with the first beta version. But nothing serious popped up which really surprised me. It shows how robust QualCoder is as a platform (and maybe also that I’ve learned from the early mistakes in my other AI-based software-project, noScribe). For the future I have several ideas how to develop this further:
What do you think, should we focus more on integrating the ai-based functions as they are right now (with some minor improvements) in a next release of QualCoder? Or should we keep the ai-version separate for a little longer so that it can become more advanced and mature? I’m open for both directions. |
Some quick comments to your suggestions:
In my case, "LongPathsEnabled" was already enabled. But even when I disabled it, QualCoder was working fine (thinking about it now: maybe I need to reboot). Do you remember which files caused the problems?
I will change that, you are right. In an early stage of the development, I was thinking about implementing several ai-powered agents to chat with, maybe even giving them names... But I moved away from this idea for now.
Yeah, I will try to explain this a little better in the status messages. |
Hi there, yes, I think all of your ideas are good. Yes an open source AI model would be better. How does the AI work with the code name/memo? - I guess it is looking for word similarities. So in that case more end user instructions on filling out good code names/memos would be beneficial. Chat with AI - yes I guess this could be a good feature. Regarding the AI and chat: It this English language focused, or is there options to have it used in multiple languages? Just curious really. Another AI related function - an idea only - might be to analyse images? Yes if you feel the AI is good and the feedback has been positive, integration should proceed. I am also wondering from the feedback from others - how best to further develop QualCoder: AND importantly, I feel, would it be better to have a bigger group working on QualCoder. |
Hi Colin,
The AI is looking for semantic similarities on the level of sentences. The process contains of three major steps: 1) "Memorizing" the semantic content of a document:
2) AI-based search: The basic principle is to search for semantic similarities between the code-name and the chunks of data in the vector store. In practice, the process is a little more involved:
3) Refining the results with GPT-4: The result of step 2 will be a long list that still contains many pieces of data that are only marginally related to the code. To narrow the results down, I send the top 12 entries of the list to GPT-4 with a prompt to
Is the Wiki here on GitHub still the main user manual for QualCoder? I think I'm going to add two pages: One where I describe the background of the AI search (basically like above but with some additional methodological notes) and one where I go more into the practical side of coding with AI. What do you think? |
I think that integration with the next release will be more convenient from the users' perspective than keeping QCAI separate. As to data privacy – maybe there should be a warning window for the use of AI,, that data should be anonymised before AI analysis and that it will be send to the third party? Do you think that there is a risk of leakage? |
Some thoughts on the future development of QualCoder:
Yes, definitely. This project is quite big for a single person to maintain. (I don’t know how many other people are involved right now.) But from my experience it is very difficult to find people that are both experienced in qualitative social research AND in programming. It would be very good to have a couple of people who would volunteer to take responsibility for certain modules of the project. I could do that for the AI-integration (answering questions and bug reports related to this topic, plan next steps, keep the libraries up to date, etc.). Somebody else could be responsible for the macOS-version (testing, compiling, updating the manual…), etc. But as I said, it is not easy to find these people, I guess.
QualCoder has a lot of functionality, and you always add to it. I don’t think that’s a problem. When it comes to AI in particular, the functionality in the commercial software packages is often quite underwhelming, especially compared with the huge marketing promises they make. Look for instance at this critical assessment of the AI-based functions in ATLAS.ti: https://youtu.be/QwMe6akHhvY If you want to achieve a wider adoption of QualCoder, I would suggest focusing on two key points:
I’m not sure. I’ve seen a lot of software projects where people get funded for one or two years, develop a prototype, publish a paper, and abandon the project shortly after that. The reason why QualCoder survived and continues to flourish is that there is a person behind it – you – that is really identified with the project and keeps it going no matter what. |
Good point!
No, I dont think that there is a risk of leakage, at least not in these particular functions that I use. As I explained in the video, any rumour in this direction would hurt the business model of OpenAI very much. But I can understand why people are generally a little suspicious when it comes to OpenAI and data protection. Anonymising a whole interview is basically impossible IMHO, especially if you are working against a large AI-model that is very good at deanynomizing text... |
This are not in my skill sets: For 2 This could be possible if the database was used such as mysql, Mariadb or similar - which could be accessed at the same time across the internet. However, the down sides are: It would be a lot, lot harder to install than using the sqlite database that is currently used. A lot of testing would be needed to ensure that functions used by different people at the same time dd not clash, and the auto updating of the codes tree etc occurred. I feel this could be beyond my skills. One thing I do like about using the sqite database - is that it is easy to zip the project and unzip elsewhere for anyone to use. Yes - the ongoing updating and responding to issues is becoming more difficult or burdensome. |
@kaixxx Ok I have added you as a collaborator on the project. you will get a request from github. |
A quick update, I have good news: The model Mixtral 8x7b was trained by the French company Mistral AI: https://mistral.ai/news/mixtral-of-experts/. It is considered an “open-weight model”. We don’t get the training data, but the model itself is freely available under the Apache 2.0 license. The performance is on par or slightly better than GPT-3.5, not quite on the level of GPT-4. But from my initial tests it seems good enough for our purpose. The model handles English, French, Italian, German and Spanish. My idea is to keep GPT-4 as a second option, mainly because it supports more languages. I see several advantages using Mixtral:
I have limited time right now because the new semester starts next week. But my plan is to try out the new AI model and work on integrating it in QualCoder over the next couple of weeks. This would then also be a good moment to add the AI functionality to the main version of QualCoder, I would suggest. @ccbogel: Thank you for adding me as a collaborator! |
(Sorry, closed this by accident.) |
@kaixxx Are you considering to add the collaboration feature? This would be a killer feature for university researchers. We're trying to use QualCoder on a project with five coders and the current version of QualCoder is not really practical. |
@MicRaving: Let's continue the discussion about collaboration features here: #894 |
That's me! :-)
Keep in mind that the number of GPUs I have is limited so far, so if newer, more interesting models appear, I remove the old ones. But to cope with code which I don't want to see broken just because of some change of mind, I made aliases which I intend to keep. For example, as of today, Mistral 7B v0.2 is aliased as You can always query the models with a
It's nice that you find it useful, and hope this helps with stability! |
@surak: Hi Alexandre, nice to see you here. I'm in the process of implementing the connection to blablador. I'm using |
Sure, as soon as there's something better (was thinking about jamba and dbrx, but not yet), I will write on the mailing list right away! |
I’m happy to share a new experimental version of QualCoder with some AI-enhanced functionality: https://github.com/kaixxx/QualCoder/tree/ai_integration
If you want to see it in action, check out the video: https://www.youtube.com/watch?v=FrQyTOTJhCc
I would really like to see this incorporated in the regular version of QualCoder in the future. But for now, I've created a seperate version so we can experiment without bothering the regular users of QualCoder.
Since I'm using a seperate config folder, both versions should be able to run alongside each other without any problems. Also the database format hasn't changed. The only thing I do is to add an additional vector-database to any project opened with the AI-enhanced version. This is used for the semantic search. Since it resides in it's own directory in the project folder, it should also not interfere with the regular QualCoder.
Thank you very much for this great piece of open-source software! I'm curious to know what the QualCoder community thinks about my additions.
All the best
Kai
The text was updated successfully, but these errors were encountered: