Connect with us

Uncategorized

NVIDIA NIM Enhances RAG Applications for Veterinary AI | IDOs News

Avatar

Published

on

NVIDIA NIM Enhances RAG Applications for Veterinary AI | IDOs News




Iris Coleman
Aug 27, 2024 19:56

NVIDIA NIM improves retrieval-augmented generation (RAG) applications, streamlining AI solutions in specialized fields like veterinary science.





The advent of large language models (LLMs) has significantly benefited the AI industry, offering versatile tools capable of generating human-like text and handling a wide range of tasks. However, while LLMs demonstrate impressive general knowledge, their performance in specialized fields, such as veterinary science, is limited when used out of the box. To enhance their utility in specific areas, two primary strategies are commonly adopted in the industry: fine-tuning and retrieval-augmented generation (RAG).

Fine-Tuning vs. RAG

Fine-tuning involves training the model on a carefully curated and structured dataset, demanding substantial hardware resources, as well as the involvement of domain experts, a process that is often time-consuming and costly. Unfortunately, in many fields, it’s incredibly challenging to access domain experts in a way that is compatible with business constraints.

Conversely, RAG involves building a comprehensive corpus of knowledge literature, alongside an effective retrieval system that extracts relevant text chunks to address user queries. By adding this retrieved information to the user query, LLMs can produce better answers. Although this approach still requires subject matter experts to curate the best sources for the dataset, it is more tractable and business-compatible than fine-tuning. Also, since extensive training of the model isn’t necessary, this approach is less computationally intensive and more cost-effective.

NVIDIA NIM and NLP Pipelines

NVIDIA NIM streamlines the design of NLP pipelines using LLMs. These microservices simplify the deployment of generative AI models across platforms, allowing teams to self-host LLMs while offering standard APIs to build applications.

NIM abstracts model inference internals like execution engines and runtime operations, ensuring optimal performance with TensorRT-LLM, vLLM, and others. Key features include:

  • Scalable deployment
  • Support for diverse LLM architectures with optimized engines
  • Flexible integration into existing workflows
  • Enterprise-grade security with safetensors and constant CVE monitoring

Developers can run NIM microservices with Docker and perform inference using APIs. Specialized trained model weights can also be used for specific tasks, such as document parsing, by modifying container commands.

Reimagining Veterinary Care with AI

At AITEM, a member of the NVIDIA Inception Program for startups, collaboration with NVIDIA has focused on AI-based solutions across multiple fields, including industrial and life sciences. In the veterinary sector, AITEM is working on LAIKA, an innovative AI copilot designed to assist veterinarians by processing patient data and offering diagnostic suggestions, guidance, and clarifications.

LAIKA integrates multiple LLMs and RAG pipelines. The RAG component retrieves relevant information from a curated dataset of veterinary resources. During preparation, each resource is divided into chunks, with embeddings calculated and stored in the RAG database. During inference, the query is pre-processed and its embeddings are computed and compared with those in the RAG database using geometric distance metrics. The closest matches are selected as the most relevant and used to generate responses.

Due to potential redundancy in the RAG database, multiple retrieved chunks might contain the same information, limiting the diversity of concepts provided to the answer system. To address this, LAIKA employs the Maximal Marginal Relevance (MMR) algorithm to minimize chunk redundancy and ensure a broader range of relevant information.

NVIDIA NeMo Retriever Reranking NIM Microservice

The NVIDIA API Catalog includes NeMo Retriever NIM microservices that enable organizations to seamlessly connect custom models to diverse business data and deliver highly accurate responses. The NVIDIA Retrieval QA Mistral 4B reranking NIM microservice is designed to assess the probability that a given text passage contains relevant information for answering a user query. Integrating this model into the RAG pipeline enables filtering out retrievals that do not pass the reranking model’s evaluation, ensuring that only the most relevant and accurate information is used.

To assess the impact of this step on the RAG pipeline, AITEM designed an experiment:

  1. Extract a dataset of ~100 anonymized questions from LAIKA users.
  2. Run the current RAG pipeline to retrieve chunks for each question.
  3. Sort the retrieved chunks based on probabilities provided by the reranking model.
  4. Evaluate each chunk for relevance to the query.
  5. Analyze the reranking model’s probability distribution in relation to the relevance determined in Step 4.
  6. Compare the ranking of chunks in Step 3 against their relevance from Step 4.

User questions in LAIKA can vary significantly in form. Some queries contain detailed explanations of a situation but lack a specific question. Others contain precise inquiries regarding research, while some seek guidance or differential diagnoses based on clinical cases or analysis documents.

Due to the large number of chunks per question, AITEM used the Llama 3.1 70B Instruct NIM microservice for evaluation, which is also available in the NVIDIA API Catalog.

To better understand the reranking model’s performance, specific queries and model responses were examined in detail. Table 1 highlights the top and bottom reranked chunks for a sample query regarding differential diagnoses for a cat losing weight.

Text Reranking Logit
Causes of weight loss that can be particularly difficult to diagnose … include gastric disease not causing vomiting, intestinal disease not causing vomiting or diarrhea, hepatic disease … 3.3125
Differential diagnoses for nonspecific signs like anorexia, weight loss, vomiting, and diarrhea … acute pancreatitis is rare in cats, … signs are nonspecific and ill-defined (anorexia, lethargy, weight loss). 2.3222
Severe weight loss (with or without increased appetite) may be noted where there is cancer cachexia, maldigestion/malabsorption … Appetite may be increased in some conditions, such as hyperthyroidism in cats, … However, a normal appetite does not rule out the presence of a serious condition. 2.2265
Overall, weight loss was the most common presenting sign … with little difference between the groups … -5.0078
Other client complaints include lethargy, anorexia, weight loss, vomiting … -7.3672
There were 6 British Shorthair, 4 European Shorthair, and 1 Bengal cat … Reported clinical signs by owners included: reduced appetite or anorexia… -10.3281
Table 1. Three highest-ranked chunks and three lowest-ranked text chunks

Figure 4 compares the reranking model probability output distribution (in logits) between relevant (good) and irrelevant (bad) chunks. The probabilities for good chunks are higher compared to bad chunks, and a t-test confirmed that this difference is statistically significant, with a p-value lower than 3e-72.

NVIDIA NIM Enhances RAG Applications for Veterinary AI | IDOs News
Figure 4. Distribution of reranking model output in terms of logits

Figure 5 shows the distribution difference in the reranking-induced sorting positions: good chunks are predominantly in top positions, while bad chunks are lower. The Mann-Whitney test confirmed that these differences are statistically significant, resulting in a p-value lower than 9e-31.

distribution-good-bad-chunks-model-sorting-625x458.png
Figure 5. Distribution of reranking model-induced sorting among the retrieved chunks

Figure 6 shows the ranking distribution and helps define an effective cutoff point. In the top five positions, most chunks are good, while the majority of chunks in positions 11-15 are bad. Thus, retaining only the top five retrievals or another chosen number can serve as one way to effectively exclude most of the bad chunks.

good-bad-chunk-balance-model-sorting-625x472.png
Figure 6. Balance between good and bad chunks by position in the sorting induced by the reranking model

To optimize retrieval pipelines, and minimize ingestion costs while maximizing accuracy, a lightweight embedding model can be paired with the NVIDIA reranking NIM microservice, to boost retrieval accuracy. Execution time can be improved by 1.75x (Figure 7).

nv-rerankqa-mistral4b-v3-comparison-625x711.png
Figure 7. NVIDIA reranking NIM microservice comparison

Better Answers with the NVIDIA Reranking NIM Microservice

The results demonstrate that adding the NVIDIA reranking NIM microservice to the LAIKA RAG pipeline positively affects the relevance of retrieved chunks. By forwarding more precise, specialized information to the downstream answering LLM, it equips the model with the knowledge that’s necessary for highly specialized fields like veterinary science.

The NVIDIA reranking NIM microservice, available in the NVIDIA API Catalog, simplifies adoption as you can easily pull and run the model and infer its evaluations through APIs. This eliminates stress related to environment settings and manual optimization, as it comes pre-quantized and optimized with NVIDIA TensorRT for almost any platform.

For more information and the latest updates about LAIKA and other AITEM projects, see AITEM Solutions and follow LAIKA and AITEM on LinkedIn.

Image source: Shutterstock



Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Uncategorized

Advantages of Mobile Apps in Gambling: The Example of Pin Up App | IDOs News

Avatar

Published

on

Advantages of Mobile Apps in Gambling: The Example of Pin Up App | IDOs News


By Terry Ashton, updated August 31, 2024

Online gambling is going mobile — over 50% of players are already playing casino games on their mobile devices, and their number is expected to grow further. But does a mobile app have actual advantages over browser-based play? We decided to do more profound research by accessing and trying gambling on a desktop browser, mobile browser, and the app. That allowed us to distinguish casino mobile applications’ key benefits and drawbacks. If you’re considering using one, just keep reading — we will share some helpful insights below. 

Benefits of Mobile Play at Pin Up Casino

The rise of online gambling happens for multiple reasons, including the following ones: 

  • Ultimate accessibility. You can access the app anywhere, even on the go. You don’t need to take additional actions — the casino opens with just one click. 
  • Lower Internet requirements, offline play. If you play for fun, you can do it even without an Internet connection. If you prefer to play real money, the requirements for an Internet connection will still be much lower because most data is already downloaded to your device. 
  • Push notifications. You can immediately learn about the new top promotions and the hottest games without checking your email. 
  • Special bonuses. Sometimes, special bonuses are granted to mobile players. Some casinos may add them occasionally to encourage players to play on apps. 
  • The same game selection. If a casino is modern and cooperates with top providers, all games will be compatible with mobile devices. For instance, if you play at Pin Up casino online, you can access the same collection of games. That goes not only for slots but also for live games, table games, etc. 
  • Higher security standards. The app is protected even better than the site. Data is encrypted, and the chance that anyone will access your account is close to zero. 

Registration also goes smoothly. Once you sign up on the browser or app, you can access the platform with just one click by entering your Pin Up login and password. 

Considering the Cons: Potential Drawbacks of Using a Pin-Up Mobile App 

Nothing is perfect, and neither are casino apps. Gamblers should also consider the drawbacks, and the most common ones are as follows: 

  • Installing software is a must. You need to install the software on your phone. It’s safe if it’s the official casino site and a good product. However, clicking on the wrong link and downloading the wrong APK file may result in problems. 
  • Battery drain and storage space. It’s no secret that charging the phone all the time is annoying, and innovative slots with top graphics may drain your battery quickly. Also, though most apps don’t take much space (in the case of Pin Up, it’s just about 100 Mb), they still require more effort to manage it. 
  • Compatibility requirements. Any app will have technical requirements, and most aren’t compatible with old mobile devices and tablets. Also, you’ll need to install updates quite regularly. 
  • Smaller screen. This is a disadvantage for those who prefer playing on larger screens, particularly those who prefer live dealer games. 

Do the pros outweigh the cons for you? If yes, the mobile app will boost your experience. If not, browser play may be a better option. 

Final Thoughts: The App vs. Browser Play at Pin-Up Casino

Technology is shaping the industry. Nowadays, there’s no such significant difference between playing on a mobile app and a mobile or desktop browser. You get the same game selection, the same bonuses, and the same smooth experience. So, it’s a matter of taste. Choose what will work best for you and enjoy your play.


Continue Reading

Uncategorized

NVIDIA Introduces Fast Inversion Technique for Real-Time Image Editing | IDOs News

Avatar

Published

on

NVIDIA Introduces Fast Inversion Technique for Real-Time Image Editing | IDOs News




Terrill Dicki
Aug 31, 2024 01:25

NVIDIA’s new Regularized Newton-Raphson Inversion (RNRI) method offers rapid and accurate real-time image editing based on text prompts.





NVIDIA has unveiled an innovative method called Regularized Newton-Raphson Inversion (RNRI) aimed at enhancing real-time image editing capabilities based on text prompts. This breakthrough, highlighted on the NVIDIA Technical Blog, promises to balance speed and accuracy, making it a significant advancement in the field of text-to-image diffusion models.

Understanding Text-to-Image Diffusion Models

Text-to-image diffusion models generate high-fidelity images from user-provided text prompts by mapping random samples from a high-dimensional space. These models undergo a series of denoising steps to create a representation of the corresponding image. The technology has applications beyond simple image generation, including personalized concept depiction and semantic data augmentation.

The Role of Inversion in Image Editing

Inversion involves finding a noise seed that, when processed through the denoising steps, reconstructs the original image. This process is crucial for tasks like making local changes to an image based on a text prompt while keeping other parts unchanged. Traditional inversion methods often struggle with balancing computational efficiency and accuracy.

Introducing Regularized Newton-Raphson Inversion (RNRI)

RNRI is a novel inversion technique that outperforms existing methods by offering rapid convergence, superior accuracy, reduced execution time, and improved memory efficiency. It achieves this by solving an implicit equation using the Newton-Raphson iterative method, enhanced with a regularization term to ensure the solutions are well-distributed and accurate.

Comparative Performance

Figure 2 on the NVIDIA Technical Blog compares the quality of reconstructed images using different inversion methods. RNRI shows significant improvements in PSNR (Peak Signal-to-Noise Ratio) and run time over recent methods, tested on a single NVIDIA A100 GPU. The method excels in maintaining image fidelity while adhering closely to the text prompt.

Real-World Applications and Evaluation

RNRI has been evaluated on 100 MS-COCO images, showing superior performance in both CLIP-based scores (for text prompt compliance) and LPIPS scores (for structure preservation). Figure 3 demonstrates RNRI’s capability to edit images naturally while preserving their original structure, outperforming other state-of-the-art methods.

Conclusion

The introduction of RNRI marks a significant advancement in text-to-image diffusion models, enabling real-time image editing with unprecedented accuracy and efficiency. This method holds promise for a wide range of applications, from semantic data augmentation to generating rare-concept images.

For more detailed information, visit the NVIDIA Technical Blog.

Image source: Shutterstock



Continue Reading

Uncategorized

AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities | IDOs News

Avatar

Published

on

AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities | IDOs News




Felix Pinkston
Aug 31, 2024 01:52

AMD’s Radeon PRO GPUs and ROCm software enable small enterprises to leverage advanced AI tools, including Meta’s Llama models, for various business applications.





AMD has announced advancements in its Radeon PRO GPUs and ROCm software, enabling small enterprises to leverage Large Language Models (LLMs) like Meta’s Llama 2 and 3, including the newly released Llama 3.1, according to AMD.com.

New Capabilities for Small Enterprises

With dedicated AI accelerators and substantial on-board memory, AMD’s Radeon PRO W7900 Dual Slot GPU offers market-leading performance per dollar, making it feasible for small firms to run custom AI tools locally. This includes applications such as chatbots, technical documentation retrieval, and personalized sales pitches. The specialized Code Llama models further enable programmers to generate and optimize code for new digital products.

The latest release of AMD’s open software stack, ROCm 6.1.3, supports running AI tools on multiple Radeon PRO GPUs. This enhancement allows small and medium-sized enterprises (SMEs) to handle larger and more complex LLMs, supporting more users simultaneously.

Expanding Use Cases for LLMs

While AI techniques are already prevalent in data analysis, computer vision, and generative design, the potential use cases for AI extend far beyond these areas. Specialized LLMs like Meta’s Code Llama enable app developers and web designers to generate working code from simple text prompts or debug existing code bases. The parent model, Llama, offers extensive applications in customer service, information retrieval, and product personalization.

Small enterprises can utilize retrieval-augmented generation (RAG) to make AI models aware of their internal data, such as product documentation or customer records. This customization results in more accurate AI-generated outputs with less need for manual editing.

Local Hosting Benefits

Despite the availability of cloud-based AI services, local hosting of LLMs offers significant advantages:

  • Data Security: Running AI models locally eliminates the need to upload sensitive data to the cloud, addressing major concerns about data sharing.
  • Lower Latency: Local hosting reduces lag, providing instant feedback in applications like chatbots and real-time support.
  • Control Over Tasks: Local deployment allows technical staff to troubleshoot and update AI tools without relying on remote service providers.
  • Sandbox Environment: Local workstations can serve as sandbox environments for prototyping and testing new AI tools before full-scale deployment.

AMD’s AI Performance

For SMEs, hosting custom AI tools need not be complex or expensive. Applications like LM Studio facilitate running LLMs on standard Windows laptops and desktop systems. LM Studio is optimized to run on AMD GPUs via the HIP runtime API, leveraging the dedicated AI Accelerators in current AMD graphics cards to boost performance.

Professional GPUs like the 32GB Radeon PRO W7800 and 48GB Radeon PRO W7900 offer sufficient memory to run larger models, such as the 30-billion-parameter Llama-2-30B-Q8. ROCm 6.1.3 introduces support for multiple Radeon PRO GPUs, enabling enterprises to deploy systems with multiple GPUs to serve requests from numerous users simultaneously.

Performance tests with Llama 2 indicate that the Radeon PRO W7900 offers up to 38% higher performance-per-dollar compared to NVIDIA’s RTX 6000 Ada Generation, making it a cost-effective solution for SMEs.

With the evolving capabilities of AMD’s hardware and software, even small enterprises can now deploy and customize LLMs to enhance various business and coding tasks, avoiding the need to upload sensitive data to the cloud.

Image source: Shutterstock



Continue Reading

Trending