rahulgopinath

joined 2 years ago
MODERATOR OF
 

Parsing plays an important role in static program analy- sis: during this step a structural representation of code is created upon which further analysis is performed. Parser generator tools, being pro- vided with syntax specification, automate parser development. Language documentation often acts as such specification. Documentation usually takes form of ambiguous grammar in Extended Backus-Naur Form which most parser generators fail to process. Automatic grammar transforma- tion generally leads to parsing performance decrease. Some approaches support EBNF grammars natively, but they all fail to handle ambigu- ous grammars. On the other hand, Generalized LL parsing algorithm admits arbitrary context-free grammars and achieves good performance, but cannot handle EBNF grammars. The main contribution of this pa- per is a modification of GLL algorithm which can process grammars in a form which is closely related to EBNF (Extended Context-Free Gram- mar). We also show that the modification improves parsing performance as compared to grammar transformation-based approach.

Also contains a full GLL implementation.

 

This paper introduces sets of binary subtree representations as an alternative to shared packed parse forests as the output of a generalised parser, and shows how these may be generated by Earley's algorithm, by a new GLL-style parser and by Johnson's continuation passing combinator style parsers. The set based output removes the clerical overhead associated with graph constructions, making the parsers simpler.

 

Generalized LL parsing is a relatively new technique for parsing arbitrary context-free grammars while achieving a cubic bound on the running time. By overcoming the limitations of traditional LL recursive descent parsers, GLL parsers are capable to support a larger class of grammars using the same reasoning as in LL parsing. In contrast to alternative parsing algorithms, GLL parsers are known for their straightforward implementation and efficiency. Since this type of parsing is still in its infancy, there are still many existing techniques such as error recovery that have to be developed. The design of these techniques however often requires a detailed understanding of how the underlying parser deals with phenomena such as non-determinism and left recursion. Since the GLL algorithm consider multiple stacks in parallel, the actual control flow of the algorithm can become rather complex, thereby making it more difficult for engineers to enhance the algorithm with new extensions. In this project, we will try to overcome this issue by explaining how a GLL parser actually works using visualization techniques. By making the relationships between GLL parsing, LL parsing, and input data explicit, the learning curve of the algorithm can significantly be reduced, thereby making it easier to develop new extensions for GLL parsing. Furthermore, the step-by-step visualization of the algorithm can be used as a debug application to test the behaviour of grammars at early stages of language development.

 

Earley parser is a well-known parsing method used to analyse context-free grammars. While being less efficient in practical contexts than other generalized context-free parsing algo- rithms, such as GLR, it is also more general. As such it can be used as a foundation to build more complex parsing algorithms. We present a new, virtual machine based approach to parsing, heavily based on the original Earley parser. We show how to translate grammars into virtual machine instruction sequences that are then used by the parsing algorithm. Additionally, we introduce an optimization that merges shared rule prefixes to increase parsing performance. Finally, we present and evaluate an implementation of Scannerless Earley Virtual Machine called north.

 

Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer's description of a community's design -- goal, rules, and member personas -- and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of "what if?" scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models' training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

 

Research has demonstrated that integrating software testing in programming courses has benefits and drawbacks. This work presents Test Informed Learning with Examples (TILE), a proposal to improve teaching of testing in introductory programming courses. We will argue why we think TILE can solve most of the known drawbacks.

 

We propose a method for using a large language model, such as GPT-3, to simulate responses of different humans in a given context. We test our method by attempting to reproduce well-established economic, psycholinguistic, and social experiments. The method requires prompt templates for each experiment. Simulations are run by varying the (hypothetical) subject details, such as name, and analyzing the text generated by the language model. To validate our methodology, we use GPT-3 to simulate the Ultimatum Game, garden path sentences, risk aversion, and the Milgram Shock experiments. In order to address concerns of exposure to these studies in training data, we also evaluate simulations on novel variants of these studies. We show that it is possible to simulate responses of different people and that their responses are largely consistent with prior human studies from the literature. Using large language models as simulators offers advantages but also poses risks. Our use of a language model for simulation is contrasted with anthropomorphic views of a language model as having its own behavior.

 

Abstract: We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed- forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task. We also evaluate ROME on a new dataset of difficult counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. Website at https://rome.baulab.info/