Parsing textual input with IncQuery

Parsing textual notation has a long history in computer science. This long time has produced many great and efficient parsing algorithms, which were optimized to the extreme by compilers for different programming languages. However in the recent years the development of integrated development environments are accelerated, adding more and more services to previously simple features like code editors and automatic background compilation. These with the rise of domain specific language frameworks which allows the user to define an arbitrary language declaratively causes that present-day computers still struggle under the resource requirements of these technologies.

While waiting in front of a textual editor, I’ve started thinking about why does the editor need to re-run the parsing algorithm every time I hit button (of course I’m aware of the fact that the main cause of the slow responsibility is probably not the parsing algorithm itself). A fully incremental approach could make these tools more responsive which would remove a lot of pain from its users. As I’m already familiar with an incremental query engine (IncQuery) the thought about using it to parse text couldn’t leave my mind.

The following article presents the experiment I’ve done as a proof-of-concept. It’s far from being useful and reveals more problems than it solves it does not produce an AST just accepts or rejects the input string, however it may be interesting as an approach and maybe in the future it will be more than just a stupid idea. Continue reading “Parsing textual input with IncQuery”

EMF-IncQuery: First public release

One of the most common tasks when working with models is executing queries on them (e.g. finding a set of corresponding model elements for code generation or finding violations for model constraints) – quite often repeatedly. To allow effective calculation and recalculation, an index of the EMF model has to be created and maintained during editing operations.

EMF-IncQuery is a tool we are developing at the Budapest University of Technology and Economics try to solve this issue by providing a runtime library for these indexes, and additionally an Eclipse-based tooling to specify and debug such queries. I have already blogged about a validation framework based on this technology, and my collegue, István Ráth gave a short presentation in last year’s EclipseCon Europe Modeling Symposium.

However, previous tool demos were held using a tooling and query language originally created for the VIATRA2 model transformation framework, so it was somewhat hard to use. Since these demonstrations we created a new, Xtext-based tooling using a modified query language that fits the EMF model specifications better.

The Xtext-based editor with type information hovers.

Another new user interface component is the Query Explorer: a view that allows evaluating developed queries without using the generated code in a new Eclipse instance, while allowing the reuse of existing domain editors.

The Query Explorer can be used to evaluate patterns using existing model editors.

As for examples, since last year we developed two new case studies for EMF-IncQuery:

  1. Derived features allow the creation of EMF derived features based on incremental queries as backend;
  2. We also created connectors for JFace databinding for query results allowing query-backed automatically updating user interfaces.

Even better, this new version is available since Wednesday – for a detailed release notes see the announcement post in our homepage. If you want to download this release, you can use the Eclipse Marketplace client (at least if search is working as expected – today we managed to not find our software there using the built-in search 🙁 ). Alternatively, EMF-IncQuery is available from our update site: http://viatra.inf.mit.bme.hu/update/incquery/

Alltogether, the new EMF-IncQuery release marks an important point: we believe at this point, it is ready for use. For me, this is an important checkpoint, as in the last year I put a considerable effort of getting this rolling.

As for the future, EMF-IncQuery is on its track to becoming an Eclipse project: the creation review is scheduled for the 10th October 2012. After the creation, we plan to migrate our codebase, and have of course further ideas on how to improve the system to be usable in more and more cases.

Applications of high-performance model queries @ OMG/Eclipse workshop

At the last EclipseCon Europe, my collegue, István Ráth presented our incremental model query approach in the Modeling Symposium. Since then, we are working hard to evaluate further uses of the technology while creating a better tooling support for the specification of those queries.

This year, during the OMG/Eclipse workshop a day before EclipseCon 2012 István will present our approach together with our new results:

14:35 – 15:00

High performance queries and their novel applications

István Ráth- Research Associate, BME

High-performance model queries are still a major challenge for the industry standard Eclipse Modeling Framework (EMF), they are intensively used in various model validation, model transformation or code generation scenarios. Existing EMF-based query technologies (like Eclipse OCL, EMF Model Query 1-2, or native Java programs) can have significant scalability issues for complex queries of models over 50-100000 model elements. Moreover, it is often tedious and time consuming to efficiently implement EMF-based queries manually on a case-by-case basis.

Recent initiatives, such as the EMF-IncQuery framework (http://viatra.inf.mit.bme.hu/incquery) have proposed innovative algorithms to mitigate this issue. EMF-IncQuery uses a graph query language, and provides incremental query evaluation by caching the results of the model queries and incrementally maintaining the cache when the underlying EMF model changes. Furthermore, the EMF-IncQuery framework can be easily integrated into existing EMF-based applications in a non-intrusive way.

In the first part of the talk, we overview the results of a thorough benchmark comparison
intended to aid software engineers in picking the best tool for a given purpose. The measurements involve several versions of Eclipse OCL, manually optimized Java code, dedicated academic query and well-formedness checking tools and EMF-IncQuery and highlight the most important practical considerations of queries in model-driven tool design.

In the second part of the talk, we briefly overview novel and innovative uses of high performance queries such design-space exploration, whereby traditional modeling is augmented with AI techniques to aid the (semi-automatic) optimization of model-based software designs.

If you are interested in modeling techniques, I recommend attending to the talk. As a teaser, I show now an existing application of such AI-based techniques: an automated quick fix generation for domain-specific modeling languages. We already presented this technique, albeit based on the VIATRA2 transformation framework in VL/HCC 2011. The article, together with a video demonstration is available from http://dx.doi.org/10.1109/VLHCC.2011.6070373

For interest, the video demonstration is also uploaded to Youtube for easier watching. (UPDATE: For feed readers the video url is the following: http://youtu.be/kPc4x01K7-s).

EMF-IncQuery @ EclipseCon Europe ’11

The research of my group often build on various Eclipse technologies. We are developers of the VIATRA2 model transformation framework. Recently we also created EMF-IncQuery, an incremental query technology over EMF models, that we believe really useful in various areas, such as model validation during editing or model synchronization scenarios.

This year, a collegue of mine, István Ráth will attend at EclipseCon Europe, and during the Modeling Symposium present the basic use cases of the tool. In the following I attach a short abstract of the talk to give a basic idea:

EMF-IncQuery: Incremental evaluation of model queries over large EMF model

A Talk at the EclipseCon Europe 2011 Modeling Symposium
Presenter: István Ráth

High-performance model queries are still a major challenge for the industry standard Eclipse Modeling Framework (EMF), as they are intensively used in various model validation, model transformation or code generation scenarios.

Existing EMF-based query technologies (like OCL, EMF Query or native Java programs) have significant scalability issues for models over 50000 model elements. Moreover, it is often tedious and time consuming to efficiently implement EMF-based queries manually on a case-by-case basis.

This talk introduces EMF-IncQuery, a declarative and scalable EMF model query framework. EMF-IncQuery uses a graph query language, and provides incremental query evaluation by caching the results of the model queries and incrementally maintaining the cache when the underlying EMF model changes. Furthermore, the EMF-IncQuery framework can be easily integrated into existing EMF-based applications in a non-intrusive way.

During the talk, we quickly overview how easy it is to define and integrate highly scalable model queries into existing EMF-based applications, in the form of a very short live demonstration using the MDT Papyrus modeling tool. The scalability of the engine will also be demonstrated, with on-the-fly constraint revalidation that takes less than 100 milliseconds over large AUTOSAR models with over 1 million elements.

 

More info: http://viatra.inf.mit.bme.hu/incquery

Unfortunately, I cannot go to the conference, but I hope, the 10th birthday of Eclipse will be celebrated well. Have fun, and if you are interested, listen at our talk.

Source code visualisation for my projects

Today I read the announcement of SourceCloud, a new Zest/Cloudio-based visualization for source code. As I like cool visualizations, I tried it out on the open projects I am participating in.At first I created the cloud for the source code of the Debug Visualisation project. Globally it consists of about half Eclipse debugger/UI and Java keywords – as this is quite a small project (about 4 kLOC), this is somewhat expected with 200 words.

SourceCloud of the Debug Visualisation Project

However, it is interesting, that the strings ‘end_of_line‘ and ‘sp_cleanup‘ found their way in. Luckily, only a few one-character names and numbers are present, meaning, we managed to created longer variable names…

The cloud of the VIATRA2 model transformation framework is more interesting: it is a much larger codebase, developed for several year by around a dozen people. Because of the size I generated the cloud using 300 words, but almost no Java keywords found their way into the graph.

SourceCloud of the VIATRA2 framework

However, the mandatory comment parts did, e.g. nbsp or the names of some creators. A lot of EMF-specific keywords are also present. They are present because we have quite a large EMF metamodel that provides several hundred generated Java files.

At last, but not least I also checked the cloud of the EMF-IncQuery project. This is smaller, then the VIATRA2 project, but larger then the Debug Visualisation. Additionally, it has a shorter history, but with several key committers.

SourceCloud of the EMF-IncQuery project

My personal favorite is this diagram, as it shows a lot of nice things: e.g. four digit numbers in large quantities (magic numbers FTW) or the absolute champion string: ‘HUKfyM7ahfg‘. All of them are present in the Ecore Diagram of the defined metamodel (the meaningless string 152 times as a postfix of an identifier 🙂 ). At least, I learned something about the internal model representation of GMF.

And how do your projects look like?