LogVisor AI is tightly integrated with ClusterVisor, classifying logs with a severity so alerts can be automatically created.
What is LogVisor AI?
LogVisor AI is an artificial intelligence log file analysis tool that is available as part of ClusterVisor.
Linux systems produce a lot of log data, so much that most of it goes unread. While lots of those log entries are benign status or debug messages, some of them are very important. With the huge quantity of messages each server produces, it’s very easy for important messages about system problems or security problems to be missed. This problem gets even further compounded when you talk about an HPC system with 10s to 1000s of nodes. It’s the proverbial finding a needle in a haystack problem.
How do you find the important messages in your logs without spending all day reading every single message? That’s where LogVisor AI can help.
The AI model
Advanced Clustering has taken the last 20+ years of experience in deploying and supporting customers’ HPC systems to build a custom AI model to categorize log entries by severity. The model was trained on log files sent by customers to our support teams. ACT engineers have gone through the painstaking task of categorizing individual log messages with the appropriate severity level: netural, warning, error or critical. This categorized data is then used to train the AI model, so it can automatically categorize all of your logs for you.
How does it work?
Most HPC clusters set up syslog to push logs into a central place, and ClusterVisor is no exception. Your ClusterVisor appliance or server becomes the central syslog server for all nodes in your cluster. As new syslog entries are received, they are sent through the AI model for inference and severity classification, then stored in our log database. Using the standard ClusterVisor appliance, we can easily run inferencing on over 2000 log entries per second.
Integration with ClusterVisor
LogVisor AI is tightly integrated with ClusterVisor, and now that logs are classified with a severity level, alerts can be automatically created. For example if LogVisor AI found a log message as critical on node01, then an alert can be automatically created for node01 with the contents of the critical log. This allows you to be notified of problematic issues with your cluster without having to manually create rules for every specific condition. It might even find issues you didn’t even know existed.
LogVisor AI doesn’t just stop at automatic alerting. Now that the entire Cluster’s syslog data is in a database, ClusterVisor provides easy to use querying and searching capabilities not found with just standard text based log files. You have full boolean search engine capabilities, as well as to filter things by AI severity, host, process, etc. Making finding important log entries much easier. This query ability is available both via the ClusterVisor web UI, and via command line tools.
While AI log file classification is very helpful, sometimes there are specific log entries that are only important to your organization. With LogVisor AI, there is a fallback method of creating revision filters via simple text search, or regular expression to classify log entries. For example, if your institution has a policy of only allowing SSH access via key, finding a log message that indicates someone logged in a password could constitute a critical alert. By creating a custom revision filter, you can elevate this benign message to a critical alert on your system, notifying you of a security problem that might easily get missed.
While Advanced Clustering has spent a lot of effort building and training the log detection model to be as accurate as possible, it’s not perfect. There will be new messages that come along that need to be part of the training dataset. To help facilitate this, LogVisor AI allows you to flag incorrectly classified log messages for retraining. These messages can then be automatically sent to Advanced Clustering for training in the next model release.
To keep LogVisor AI current, it will regularly check with our servers on the Internet for new models, and then download and use them when available. This makes your LogVisor AI installation get smarter as time goes on.